\documentclass[12pt,a4paper]{article}

\usepackage{amsmath, amsthm, amssymb, tikz, realboxes, bibentry, natbib, url, a4wide, graphicx, verbatim, setspace} 
\usepackage[affil-it]{authblk}
\usepackage{listings}

\lstset{
language=R,
basicstyle=\scriptsize\ttfamily,
commentstyle=\ttfamily\color{gray},
numbers=left,
numberstyle=\ttfamily\color{gray}\footnotesize,
stepnumber=1,
numbersep=5pt,
backgroundcolor=\color{white},
showspaces=false,
showstringspaces=false,
showtabs=false,
frame=single,
tabsize=2,
captionpos=b,
breaklines=true,
breakatwhitespace=false,
title=\lstname,
escapeinside={},
keywordstyle={},
morekeywords={}
}


\begin{document}
\author{Philipp Hunziker\thanks{Email: hunziker@icr.gess.ethz.ch}}

\affil{Center for Comparative and International Studies\\ ETH Z\"{u}rich}
\title{Does Petroleum Extraction Make Ethnic Identities Politically Relevant? \thanks{Prepared for the APSA Annual Meeting, August 29 -- September 1 2013 in Chicago. }}

\date{\today\\ \vspace{10 mm} \small{Preliminary version. Please do not cite without permission.}}
\maketitle

\abstract{Does petroleum extraction make ethnic identities politically relevant? This paper postulates that it does. Specifically, I argue that rent-seeking possibilities and negative externalities associated with large-scale resource extraction create incentives for ethnic mobilization in petroleum-rich regions. 
Moreover, I hypothesize that this process should lead to the emergence of particularly small ethnic groups. These arguments are tested on the basis of a spatial research design employing geo-coded data on politically relevant ethnic groups and productive petroleum fields.
Preliminary evidence suggests that there does seem to be a robust association between petroleum extraction and the emergence of small, politically mobilized ethnic groups, but the effect is confined to ex-colonial countries in South-Saharan Africa and Asia.}

\newpage
\tableofcontents
\newpage
\onehalfspacing

\section{Introduction}
\label{Sec:1}

The Ogoni are an ethnolinguistic group home to Nigeria's Niger Delta. With approximately 750'000 members\footnote{This figure was estimated based information provided in the EPR dataset, as described in section \ref{Sec:4}.} in a country of over 150 million, they constitute a minuscule minority. Their vanishingly small demographic weight within Nigerian society is emphasized by the composition of the country's dominant ethnic identities. Nigeria's post-colonial history has in large parts been determined by the struggle between the Hausa-Fulani, Yoruba, and Igbo groups, consisting of approximately 30\%, 20\% and 20\% of the population, over control of the central state.

Against these odds, by the early 1990s, Ogoni identity has become a highly salient basis for political mobilization, with several political organizations making explicit ethnic claims and receiving national and international attention \citep{HRW1995}. Though Ogoni mobilization has its origins in the 1940s, mass mobilization peaked in 1993, against the backdrop of Nigeria's gradual transition away from military dictatorship at the time. Regular mass protests were held, calling for increased autonomy in local governance, and even an independent Ogoni state. 

The history of Ogoni mobilization is intricately interwoven with the Niger Delta's petroleum industry. Ogoni calls for autonomy have centered almost exclusively around claims related to oil production; central themes were the allegedly small share of oil revenue channeled back to local communities by the central state,\footnote{However, apparently, policy granting oil producing communities preferential access to oil revenue has been in effect even before these vocal mobilization efforts (Osaghae 1995: p. 332).} and the devastating environmental and social burdens caused by the extraction process \citep{Osaghae1995}.

Reading the history of Ogoni mobilization, it is almost impossible to escape the conclusion that petroleum extraction has been its main underlying driving force. The same conclusion suggests itself when taking a comparative look at Nigeria's ethnopolitical landscape. Of the hundreds of language groups inhabiting the country, most either identify themselves with one of the major ethnic groups mentioned above, or are irrelevant to the national political scene. Hence, Ogoni exceptionalism can hardly be explained by a lack of possible ethnic delineations in the rest of the country. Rather, at least superficially, there seems to be an extremely strong spatial association between the concentration of Nigerian oil wealth in the comparatively small Niger Delta region, and the emergence of a demographically negligible ethnic group on the national political scene. The suspicion that petroleum is, at least partially, a determinant of the Nigerian ethnic landscape is further supported by evidence presented by \citet[p. 31]{Bhavnani2009}, who report survey results suggesting that ethnic self-identification in Nigeria is greatest in the Niger Delta region.

The goal of the present paper is to investigate whether the Ogoni case, or at least the provided interpretation of it, is generalizable, and if so, to what degree. Specifically, I address the following question: 

\emph{Does petroleum extraction lead to the emergence of politically salient ethnic identities? }

Why should we be interested in this particular relationship? First, I argue that gaining a better, more systematic understanding of the political consequences of petroleum extraction is a key requirement for anticipating likely future scenarios in regions facing resource windfalls. Should we, for instance, expect newly discovered oil and gas reserves in East Africa (specifically, Kenya, Uganda, Tanzania and Mozambique, \citealt{Economist2012}), to have an impact on the (sometimes delicate) ethnopolitical equilibria in the respective countries? 

Second, answering the proposed research question should improve our understanding of the determinants and mechanisms underlying the emergence of politically salient ethnic identities. In particular, providing evidence for a systematic link between petroleum extraction and politically relevant ethnicity would speak in favor of instrumentalist theories of ethnic salience. Given the immense wealth associated with large-scale petroleum extraction, it is difficult to devise of an explanation of the proposed phenomenon that does not involve materialistic motivations as a contributor to ethnic mobilization.

Finally, the answer to the proposed research question may even have implications beyond the study of ethnic politics. Specifically, whether petroleum extraction affects the political salience of ethnic identities may be relevant to the literature addressing the various ``resource curses''. Natural resource abundance, and oil and gas in particular, have been linked to a variety of adverse social outcomes, such as intrastate conflict (e.g., \citealt{Ross2006}, \citealt{Lujala2010}), persistent non-democratic regimes (e.g., \citealt{Ross2001}) , and slow economic growth (e.g., \citealt{Sachs2001}). Finding that petroleum affects the emergence of politically relevant ethnicity might provide important theoretical inputs to these research agendas. Perhaps, indeed, some of the negative effects attributed to petroleum abundance run through the creation of ethnic cleavages. Moreover, a statistical link between petroleum and ethnic salience would suggest that ``ethnicity'' should not be interpreted as an exogenous factor when analyzing the policy-implications of petroleum wealth and extraction. In particular, this would suggest that testing resource-related hypotheses while “controlling” for ethnicity as a competing explanation for the outcome under scrutiny, as, for instance, practiced by \citet{Collier1998} in their analyses of civil war, is highly inappropriate. 

In the remainder of this paper, I attempt to answer the proposed research question by adopting a quantitative research design. In particular, based on spatially defined units of analysis, I will attempt to estimate whether, on a global scale, we are more likely to see politically relevant (and geographically concentrated) ethnic groups in petroleum-rich locations, and whether these groups tend to be particularly small. 

It is legitimate to ask whether this approach is the most preferable among the available options. Specifically, there is no lack of potentially interesting cases, such as the Angolan enclave of Cabinda \citep{Porto2003}, the Indonesian province of Aceh \citep{Kell2010}, and the Ecuadorian Oriente \citep{Steyn2003}, which appear to feature the proposed link between petroleum extraction and ethnic mobilization, and lend themselves to qualitative inquiry. The primary motivation for adopting the present framework is to avoid the pitfalls of selection bias. Qualitative analysis, even if comparative, creates the difficult challenge of appropriately identifying and evaluating the ``dogs that didn't bark''; negative cases, where the outcome of interest did not materialize. The geographical research design proposed in this paper alleviates this problem, thus providing the basis for systematic causal inference. 

The remainder of this paper is structured as follows. The next section provides a brief overview of the existing literature on the empirical relevance, and the emergence, of politically salient ethnic identities. Section \ref{Sec:3} introduces a theoretical framework of the possible mechanisms from petroleum extraction to ethnic mobilization, and postulates according hypotheses. Next, section \ref{Sec:4} introduces the geographical research design at the core of this paper, whereas section \ref{Sec:5} discusses the entailed econometric challenges. Finally, results are discussed in section \ref{Sec:6}. 



\section{Literature Review}
\label{Sec:2}

\subsection{Salience Matters}

In recent years, a growing body of empirical research has explicitly addressed the idea that political salience is a key component to understanding the link between ethnicity and a wide range of policy outcomes. Specifically, it is increasingly accepted that for understanding the role of ethnicity in shaping politics and policy, it is of crucial importance to distinguish between what one may call the ethnic ``source material'' of a country, that is, the pool of religious, linguistic and phenotypical categories that may potentially serve as the basis for ethnic identification, and those ethnic identities that actually serve as salient cleavages in the policymaking process. 

The idea that ethnic identity is context-dependent is not particularly new. Indeed, constructivists have long argued that framing ethnicity as a demographic constant is misleading (see \citealt{Fearon2000} for an overview). Rather, so the constructivist argument, ethnicity is best understood as the result of a social process; that is, ethnicity does not naturally group individuals into internally coherent and externally incompatible categories, but is only as meaningful as humans make it to be through discourse and actions that reiterate the idea of otherness \citep[p. 848]{Fearon2000}. Consequently, depending on the time of analysis, or even the issue under scrutiny, individuals belonging to two social categories may consider this distinction highly relevant and associated with strong expectations about the other's political preferences and actions, or completely meaningless. Once we accept this premise, it seems trivial to conclude that any analysis of the effects of ethnic diversity on policy outcomes should thus attempt to distinguish between meaningful ethnic cleavages and purely anthropological social categories \citep{Laitin2001}. However, partially because collecting data on salient ethnic identities is cumbersome, partially because of disciplinary boundaries, the constructivist insight that salience should be accounted for has only relatively recently found entrance into quantitative analyses of the role of ethnic diversity for policy outcomes. 

In accordance with constructivist expectations, those studies that do attempt to identify politically relevant ethnicity generally find that salience matters. \citet{Posner2004}, for instance, criticizes the widespread use of the ELF (Ethno-Linguistic Fractionalization) index in cross-country growth regressions (see, e.g., Alesina and La Ferrara 2004), and finds that a time-variant ethnic fractionalization index that only incorporates politically salient ethnic groups outperforms the ELF in explaining macroeconomic policy and long-run growth rates in Africa. Similarly, \citet{Cederman2007} and \citet{Cederman2010} criticize the use of the ELF index in studies addressing the onset of violent intrastate conflict. They argue that previous claims by \citet{Collier1998} and \citet{FearonLaitin2003} on the apparent irrelevance of ethnic identities for explaining the outbreak of civil war may have been premature. Rather, \citet{Cederman2010} argue that the latter authors' non-findings are related to their use of the ELF index, which fails to capture those ethnic cleavages that are relevant for explaining the outbreak of ethnic conflict.  Indeed, using a data set that explicitly identifies politically relevant ethnic groups and their access to state power,\footnote{Incidentally, this is also the data set underlying the empirical analysis in the present paper.} \citet{Cederman2010} find that there is strong relationship between ethnopolitical exclusion and civil conflict.

\subsection{Explanations of Ethnic Salience}

Naturally, if it is the case that political salience mediates the link between ethnicity and policy outcomes, this raises the question of why and when social identities become politically relevant in the first place. The goal of the present paper is to contribute to answering this question by investigating the role of a very specific phenomenon -- petroleum extraction --  in creating politically salient ethnic identities. 

Although, to my knowledge, this paper is the first to analyze explicitly the effect of petroleum on ethnic salience, it builds on an extensive body of literature from several disciplines that tries to explain political identity formation more generally. Following \citet{Fearon2000}, it is helpful to group this literature into three more or less distinct theoretical frameworks.

First, one prominent approach that addresses the roots of political identity formation are the macrohistorical accounts of the emergence of nationalism by \citet{Deutsch1953}, \citet{Gellner1983} and \citet{Anderson2006}. These authors focus explicitly on nationalist identities (in contrast ethnic identities, which need not be nationalist), and argue that nationalism has emerged as the product of long-term social processes, such as the rise of the modern state system, economic modernization, and mass communication. These processes are argued to have made nationalist identification appealing for emergent mass publics. 

A second strand of literature that tries to understand the emergence of social identities focuses on the role of discourses in shaping individuals' perceptions of group membership. Here, it is argued that ``individuals are pawns or products of discourses that exist and move independently of the actions of any particular individual'' \citep[p. 851]{Fearon2000}.  

The third approach, which has gathered increased attention in recent years and is most immediately relevant for the research question at hand, are instrumentalist explanations of ethnic salience. These theories hold that politically relevant ethnic cleavages emerge because ethnic mobilization is beneficial for some or all group members. 

In contrast to the macrohistorical approach in the classic literature on nationalism, these theories look more specifically at the issue of the emergence of multiple ethnic identities within the same polity (in contrast to a unifying nationalist identity), and operate on a much shorter time frame. Here, the incentives for identifying as part of a particular ethnic group in the political process are located at the level of more or less immediate political payoffs, rather than arising from sweeping social transformations. In contrast to the discourse theoretic approach, instrumentalist theories postulate a clear, individualistic means-ends calculation at the center of the identity formation process, rather than locating agency at the supra-individual level. 

One body of literature within this instrumentalist framework attempts to explain the emergence of politically salient ethnic identities by focusing on political elites who incite inter-ethnic violence for their own political survival (\citealt{DeFigueiredo2000}, \citealt{Fearon2000}) Other, more recent publications locate incentives for ethnic mobilization not at the elite level, but at the level of mass publics, and focus less narrowly on episodes characterized by political violence. These approaches model ethnic salience as the product of a process that resembles a strategic choice scenario: Opting for political mobilization along a certain ethnic identity occurs because it is beneficial for its members, especially in the context of gaining access to state-funded public goods or achieving otherwise favorable policy outcomes. Different authors have proposed various factors allegedly determining when individuals choose to mobilize along a certain ethnic cleavage. \citet{Posner2007}, for instance, highlights how political institutions set incentives to organize along certain ethnic dimensions. \citet{Esteban2008} argue that individuals choose to compete over government resources in ethnic coalitions if there is substantial intra-group economic inequality. 

Most relevant for the research question at hand, however, are the contributions by \citet{Fearon1999} and \citet{Caselli2006}, henceforth collectively referred to as ``FCC''. These authors argue that individuals choose to adopt ethnic identities in order to form effective minimal winning coalitions in the contest over government resources. Similar to other instrumentalist theories of ethnic salience, these authors start by assuming that individuals organize along ethnic lines in order to gain access to government resources that can be selectively distributed to members of the group. FCC's key contribution, however, is that they provide an explanation for why individuals choose ethnicity, as opposed to other group-membership criteria, as the basis for political coalitions. The reason individuals organize along ethnic identities, rather than any other possible category (class, region, issue-specific interest groups), according to these authors, is that ethnic groups serve as effectively enforceable minimal winning coalitions. The term ``minimum winning'' refers to the assumption that in the struggle over state resources, individuals face incentives to form coalitions that are large enough to beat other contestant groups, but only by a minimal margin, as to maximize the per-capita value of the resulting spoils. In this context, ethnicity is argued to be a particularly effective criterion for defining group membership, because it prevents outsiders from joining a coalition once it has gained access to state resources. Since social markers that serve as the basis of ethnic identities, such as language, religion and phenotype, are often easily visible and difficult to change, coalition membership can be enforced even after the inter-group contest over the rewards. If group membership is not enforceable, rewards will be diluted, and, in anticipation of this effect, the initial incentive for mobilizing minimum winning coalitions will be much weaker.  

From this basic model of ethnic salience FCC derive a number of empirical predictions with regard to where we should observe politically relevant ethnicity, and which social categories should emerge as salient ethnic identities. One noteworthy prediction in both papers is that particularly rigid and easily detectable social markers, such as phenotype and language, should be associated with salient ethnic cleavages more often than more porous and inconspicuous attributes, such as religious denomination. More importantly, however, both articles relate the emergence of politically relevant ethnicity to the availability of government resources that can be appropriated by ethnic coalitions and distributed to their members. These resources, which Fearon calls ``pork'' and Caselli and Coleman refer to as ``expropriable assets'', have in common that the government manages their allocation, they are excludable (otherwise coalition formation would be unnecessary), and they are rivalrous in consumption (thus creating incentives for minimal winning coalitions). Examples include lump-sum handouts or civil service positions. This prediction is of particular relevance for the research question at hand because it has very straightforward implications with regard to the effect of large-scale natural resource endowments on identity formation. Resource windfalls clearly satisfy all of the above mentioned criteria, and should thus provide major incentives for ethnic mobilization. In fact, \citet[p.5]{Caselli2006} explicitly mention mineral resource revenue as an incentive for ethnic coalition building.  

\subsection{Does Petroleum Matter?}

Despite the fact that the instrumentalist framework provided by FCC would imply a very straightforward connection between natural resource wealth and politically salient ethnic identities, the relationship has so far not been tested systematically. However, as already eluded to in the introduction, there are good reasons to do so. First, showing that petroleum affects the emergence of politically relevant ethnic groups would lend considerable support to the instrumentalist approach in general, and Fearon (1999) and Caselli and Coleman's (2006) theory in particular. Though, as I will argue in the next section, the latter authors' theory is certainly not the only possible explanation for a possible link between petroleum extraction and ethnic salience, finding according evidence would strongly speak in favor of the instrumentalist assumption that ethnic salience is the outcome of relatively short-term, individual-level means-ends reasoning. 

Second, demonstrating an empirical link between petroleum extraction and ethnic salience would add further evidence to the constructivist understanding of ethnicity, and underscore Laitin and Posner's (2001) argument that empirical researchers should not frame ethnicity as a demographic constant, but as a context-dependent and evolving social construction.

Finally, finding that petroleum affects the emergence of politically relevant ethnic identities may have major implications for the various research agendas relating petroleum extraction to adverse political outcomes, such as slow economic growth (e.g. \citealt{Sachs2001}), failure to democratize (e.g. \citealt{Ross2001}, \citealt{Smith2004}, \citealt{Jensen2004}), or violent intrastate conflict (e.g. \citealt{Humphreys2005}, \citealt{Ross2006}, \citealt{Lujala2010}). 

On the one hand, showing that petroleum affects ethnic salience would have important theoretical implications for these literatures, in that it would suggest that the various ``resource curses'' may run through the creation of horizontal ethnic cleavages. In the case of regime type, for instance, it may be the case that petroleum impedes democratization because it allows leaders to stay in power by effectively paying off relatively small, well defined ethnic coalitions. Similarly, the link between petroleum and violent conflict may be caused by the emergence of previously politically irrelevant ethnic communities that contest government authority over the allocation of resource rents (similar arguments have been made by \citealt{Ross2006} \citealt[ch. 5]{Ross2012}). On the other hand, if it is the case that petroleum extraction affects the creation of politically salient ethnic identities, this may have important implications for how we should analyze the various ``resource curses'' empirically. For instance, if the effect of petroleum on the outbreak of civil conflict runs through the emergence of salient ethnic identities, adding measurements of ethnic fractionalization and resource wealth as linear predictors into a conflict-onset regression to see which explanation receives more evidence, as is practiced by Collier and Hoeffler (1998), will generate misleading results, since we are controlling for an anteceding factor.  Similarly, if petroleum affects the emergence of politically relevant ethnic groups, analyses attempting to estimate the effect of petroleum on the outbreak of ethnic conflict that use politically relevant ethnic groups as the unit of analysis (such as \citealt{Sorens2011}) will suffer from selection problems.




\section{Theoretical Framework}
\label{Sec:3}

In this section I present a theoretical framework with the goal of explaining why we should expect the emergence of politically salient ethnic identities in the presence of large-scale petroleum extraction.

Far from introducing a truly novel model of identity formation, I will adopt the core premises of the instrumentalist theories of ethnic salience, and build substantially on the work of Fearon (1999) and Caselli and Coleman (2006). However, I will attempt to complement the latter authors' arguments, which would imply that the role of natural resources in creating and modifying ethnic identities runs primarily through monetary incentives, with an alternative explanation, which locates the identity-establishing effects of petroleum in its adverse impacts on local communities.

To make the exposition more accessible, I divide it into three successive parts.
\begin{itemize}
\item First, I argue that petroleum extraction creates incentives for individuals living near production sites to organize politically, thus creating horizontal and geographically delimited political cleavages.
\item Second, I argue that there are reasons to expect that this process will lead to relatively small political coalitions, that is, groups that contain only a relatively small fraction of a country's population.
\item Finally, I will present arguments that suggest that such coalitions will often be defined via ethnic markers, rather than any other social delimiter.
\end{itemize}

\subsection{Why does petroleum extraction lead to horizontal cleavages?}

I argue that proximity to petroleum extraction sites creates incentives for political mobilization among the local population. Specifically, I present two explanations for why this is the case.

First, building on Fearon (1999) and Caselli and Coleman's (2006) framework, one may argue that petroleum extraction may set incentives to build horizontal cleavages for the purpose of acquiring resource rents, and distributing them among group members. However, FCC's argument does not yet entail a geographical component -- why should individuals in proximity to resource extraction sites face incentives to organize into a common coalition, rather than to join any other potential grouping of individuals in a country? I argue that this is the case because individuals anticipate that a group consisting of citizens living in proximity to a petroleum extraction site will be in a privileged bargaining position vis-\`a-vis other coalitions in the struggle over access to government revenue. Groups inhabiting the area surrounding resource extraction sites will be able to gain access to a greater share of resource windfalls because they have substantial bargaining leverage -- they may threaten to use institutional or extra-institutional means to impede the state's ability to extract petroleum, and thus shrink the total size of the pie available for distribution. Such measures may range from purely legal challenge, to organized protests, to the ultimate threat of attempting secession and thus cutting off the rest of the country from resource spoils entirely.

A second argument for why individuals in proximity to petroleum extraction sites face incentives to mobilize is that the extraction process itself creates common policy preferences, which are most effectively pursued by pooling one's resources. It is widely known that, especially in developing countries, petroleum extraction is often accompanied with a host of negative externalities for local communities. Without well-established local governance structures and an effective legal system, the establishment of an extractive industry is very likely to produce substantial costs for the surrounding population. For instance, in Indonesia's Aceh province, the combination of rapid industrialization with little or no governance institutions in place has led to land expropriation, catastrophic pollution, and massive in-migration \citep[p. 35]{Kell2010}. The same is documented for many other cases of natural resource extraction, for instance in the Niger Delta \citep{Watts2004}, Sierra Leone \citep{Richards1996} and Ghana \citep[p. 12]{Switzer2001}. These externalities will likely shift local individuals' political demands towards the management of these issues, thus aligning the policy preferences of individuals in petroleum producing areas, and creating the basis for effective political mobilization.

It is worth noting that these two arguments for why we should see political mobilization in petroleum producing areas have appeared in similar form in the literature that attempts to explain the link between natural resource production and violent conflict. There, arguments of the first type, which relate political mobilization to``price-grabbing'' incentives, and arguments of the second type, which highlight the negative externalities of the extraction process, are often framed as competing mechanisms for explaining the statistical relationship between resource extraction and violent civil conflict (see, e.g., \citealt{Ross2004a}, \citealt{Humphreys2005}). However, there is little reason to believe that these are mutually exclusive. In fact, they appear to be fairly complementary: individuals in resource producing regions may organize for the purpose of appropriating a greater share of resource rents precisely because they feel entitled to be compensated for the adverse effects of petroleum production. 

\subsection{Why does petroleum extraction lead to relatively small political coalitions?}

I argue that resource extraction will not only set incentives for the emergence of spatially concentrated political coalitions, but that these coalitions will typically be relatively small in comparison to the overall population of the country. 

Again, I propose two mechanisms underlying this claim.

The first builds on the above-made argument that political mobilization in resource-rich areas occurs for the purpose of appropriating resource rents for in-group distribution. As argued by FCC, because such rents are rivalrous in consumption, there are strong incentives to form minimal winning coalitions, as to maximize expected per capita payoffs. However, as highlighted by FCC, these incentives will be affect any coalition building effort with the goal of appropriating political ``pork'', not just coalitions in resource-producing areas. So why would we expect the latter to be particularly small? The answer is, again, bargaining leverage.

Having direct access to petroleum extraction sites eases the requirement to create larger coalitions with more members for the purpose of gaining leverage vis-\`a-vis other contestant coalitions. Put differently, proximity to extraction sites increases a group's per-capita leverage; you can gain a lot of influence on the national scene if you are a well-organized group sitting on top of your country's main source of income. Hence, while incentives to appropriate resource revenue by forming a particularly small political coalition are universal, only groups in the producing areas have the necessary political leverage to actually do so. 

The second mechanism I propose to explain coalition size in petroleum rich areas builds on the argument that the negative externalities associated with resource extraction create according political demands only on a very local scale. Resource extraction often entails consequences so severe that it is straightforward to expect that their management become the most dominant policy issue in local communities, and will generate significant potential for mobilization. However, because these externalities are spatially confined to extraction sites, the absolute number of individuals affected by them, and willing to mobilize along this issue, will be relatively minor. In fact, because the effects of imposing stricter regulation on petroleum extractors will only benefit local communities, yet impose costs on the entire country (even if only in the form of opportunity costs), we may expect that individuals living outside resource extracting areas will have little interest in introducing such measures. Hence, because the adverse effects of resource extraction are spatially concentrated, yet represent a highly salient policy issue for the affected communities, we would expect to see the emergence of relatively small political coalitions in resource rich areas that mobilize around demands for stricter regulation of the petroleum industry.

\subsection{Why would political coalitions in petroleum-producing areas form along ethnic identities?}

Even if, as postulated above, large-scale petroleum extraction creates incentives for mobilization into geographically concentrated political coalitions, this does not imply that the latter need to be defined in terms of ethnic identities. Alternatively, one could imagine mobilization on the basis of purely geographical delimiters (residents of a valley, or a river delta), or administrative or federal units. 
I identify two explanations for why we should expect ethnicity to be a particularly effective basis for coalition building in this context.

First, there is of course FCC's key argument that an ethnicity-based membership criterion is an attractive choice because it makes group membership more easily enforceable. Other than a criterion that relies purely on place of residence, ethnicity provides social makers that are often easily detectable, and, more importantly, difficult to change for individuals. Hence, if coalitions from petroleum-rich areas are successful in acquiring a significant share of resource rents to distribute among their members, an ethnic membership criterion prevents outsiders from joining the group ex-post, and diluting per-capita payoffs. In fact, the fear that outsiders may want to benefit from in-group spoils may be especially pertinent in resource-producing communities, since these often experience substantial in-migration. In the absence of permanent markers that easily distinguish insiders from newcomers, members of resource producing communities may feel that their efforts to gain access to resource windfalls unjustly benefit newly arriving immigrants.

It is interesting to note that this argument does not hold if mobilization in resource-rich areas takes place primarily because locals want to eliminate the costs associated with large-scale petroleum extraction. Policies that combat the latter, such as environmental regulation and due process rules for land expropriations, are not political``pork'' because they are not rivalrous in consumption. Consequently, the value of these policies to local communities does not diminish in the number of beneficiaries, and hence there is no need to enforce group-membership once they are adopted. 

However, there is a further explanation for why individuals in resource-rich regions would choose to mobilize along ethnic lines. If an ethnic group is based on a common language (or, less-so, a common religious denomination), it is likely that its members' social networks will consist mainly of other group members.  These pre-existing social networks, as well as the possibility of creating a public discourse that addresses group members exclusively, will make mobilization substantially less costly \citep{Bates1983}. Discussing this argument, Fearon (1999) also adds that repeated in-group interaction may foster trust among coethnics, which further facilitates mobilization \citep{Fearon1996}. Hence, resource-rich communities may organize along ethnic lines simply because it is a particularly effective mobilization strategy. 

\subsection{Hypotheses}

In the preceding section I have established that petroleum extraction should provide incentives for political mobilization of relatively small ethnic groups that are spatially concentrated in petroleum producing areas.
The goal of this section is to translate these theoretical considerations into explicit empirical expectations. 
Specifically, for the purpose of this paper, I will attempt to formulate testable implications that refer to petroleum's effect on the spatial composition of politically relevant ethnic groups in any given geographic region.

In order to derive a testable hypothesis of this kind it is useful to think in terms of counterfactuals: How would we expect the ethnic composition in a given petroleum-rich region to look like in the hypothetical absence of valuable resources? And, analogously, how would we expect the ethnic composition of a resource-poor region to look like if it contained oil and gas fields?
In practice, local ethnopolitical landscapes differ considerably across and within countries, even in the absence of high-value resources. For this reason, it is necessary to consider different counterfactual scenarios that take this heterogeneity into account in order to formulate precise empirical expectations.

Consider the following scenario: We observe a petroleum-free region within some country where, for reasons beyond the scope of this paper, we do not observe politically relevant territorially delimited ethnic groups. Though individuals may identify with various subnational ethnic groups, perhaps along linguistic divides, the latter do not constitute politically salient categories. 
Now imagine the counterfactual case where sizable oil reserves are located in the region under consideration. According to our theoretical framework, the latter situation should create significant incentives for individuals located near the petroleum extraction site to mobilize along some common ethnic delimiter, such as a common language, and issue political demands referring to their ethnic identity. This counterfactual comparison is illustrated in figure \ref{Fig:Hyp} and labeled \emph{scenario A}. The circular extract depicts a hypothetical arbitrary region within a country, the red dot represents a productive petroleum field, and the green area portrays the settlement area of some politically relevant ethnic group.

\begin{figure}[h!]
	\centering
		\includegraphics[scale=0.5]{Maps/hypotheses_20131031.pdf}
	\caption{Counterfactual case comparisons for different local ethnopolitical constellations. }
	\label{Fig:Hyp}
\end{figure}

Next, consider the following alternative scenario. We observe a petroleum-free region where individuals identify with two relatively large ethnic groups which constitute politically salient cleavages. Again, we now picture the counterfactual case where there are substantial petroleum reserves in the center of the given area. According to our theoretical framework, we might expect that petroleum production sets incentives for individuals to mobilize along a smaller ethnic identity than they would otherwise, composed of individuals living exclusively in the vicinity of the extraction site. In practice, one could imagine that individuals identify and mobilize on the basis of a common local dialect in the petroleum-abundant case, whereas mobilization and self-identification would focus on the larger language family in the absence of the incentives generated by high-value resources. This counterfactual comparison is labeled \emph{scenario B} in figure \ref{Fig:Hyp}, whereas the dark shaded areas represent the two larger politically relevant ethnic groups.

In order to pursue a comprehensive test of the proposed theoretical framework, it is desirable to derive a hypothesis that mirrors the effects expected in either scenario, since both conform to the logic proposed in the theoretical discussion.
However, doing so is not entirely trivial.
\emph{Scenario A}, for instance, may suggest testing the simple hypothesis that petroleum-rich regions should be more likely to feature at least one politically relevant territorial ethnic group in comparison to petroleum-free regions. However, this hypothesis constitutes only an incomplete test of the theory, since effects of the type depicted in scenario B would be ignored. In other words, the effect of petroleum production on ethnicity, even if present, would go unnoticed in regions where salient territorially delimited ethnicity is prevalent even in resource-poor regions.
Alternatively, one might consider testing the hypothesis that the average politically relevant ethnic group in petroleum-rich regions should be smaller (in demographic terms) than the average group in petroleum-free regions. Clearly, this test would be able to pick up the type of effect depicted in scenario B, where we expect a fractionalization of otherwise large groups into smaller ones.  Unfortunately, however, this formulation induces a selection problem. Clearly, group size is only observable where we observe a politically relevant ethnic group in the first place -- consequently, the analysis would be restricted to a sample of areas featuring at least one politically relevant ethnic group. As is well well known in the econometric literature, this type of non-random selection may induce substantive bias in statistical estimates.

Finally, as a third option that avoids these issues, I propose the following hypothesis: 
\begin{center}
\parbox{0.8\textwidth}{
	\emph{Petroleum producing areas should exhibit a larger number of politically relevant and territorially delimited ethnic groups than comparable, petroleum-free regions.}
}
\end{center}
This test of the theory is appealing because it applies for both scenarios considered, and its response variable is observable throughout the sample.   

In addition to this core hypothesis, I suggest a qualifying proposition. Specifically, I argue that there are good reasons to expect that the effect postulated above \emph{only operates in relatively young, ex-colonial developing countries}. 
First, ex-colonial states frequently feature political institutions that favor rent-seeking behavior and clientelism over other, issue-specific mobilization platforms. It is well known that colonial powers have often left behind poorly crafted government institutions with a severe concentration of power at the executive level, and few independent constraints (see the discussion by \citet{Acemoglu2000} of extractive institutions, or Posner's (2007) characterization of African politics). This institutional setting provides strong incentives for the``prize-grabbing'' type of politics discussed by FCC, where political coalitions form along horizontal cleavages with the goal of gaining access to government hand-outs and benefits. The key problem underlying this dynamic is the absence of effective checks and balances on executive power (Fearon 1999, who cites \cite[p. 166]{Limongi1997}), which would ensure that incumbent governments do not divert state resources exclusively to their constituency, but install policies that benefit broad sections of the population. In the absence of such barriers, incumbents face strong incentives to secure their hold on power by providing``pork'' to their supporters, and paying little attention to public goods provision. In this context, political mobilization will rely largely on attempts to acquire part of the pie for one's own group, rather than any issue-specific platforms. It is exactly this type of environment where we would expect the rent-seeking argument underlying the above-stated hypothesis to be particularly likely to apply.  

A second reason why we would expect ex-colonial states to be of particular interest for the theory at hand is the pertinence of ethnicity based political coalitions in these countries. Competing in political``prize-grabbing'' based on ethnic clientelism has become the political modus operandi in many ex-colonial states \citep{Wimmer1997}, and particularly in South-Sahara Africa (\citealt{Lemerchand1972}, \citealt{Posner2005}). This is not least the case because of the legacy left by European colonial rulers, who have often deliberately shaped ethnic identities and institutionalized ethnicity-based patronage systems to govern their dependencies (\citealt{Horowitz1985}, \citealt{Young1994}). Naturally, in countries where political payoffs are almost exclusively determined by membership in an ethnic group, we would expect individuals in resource-rich regions to be particularly likely to mobilize on the basis of ethnic identities, regardless of whether they do so to address negative externalities or to access resource rents. In contrast, in countries where political platforms based on issue-related and cross-cutting cleavages are well established, entering the political arena on the basis of local ethnic identities will be much more difficult. 

Finally, ex-colonial states may be particularly susceptible to the proposed mechanisms simply because their citizens have a comparatively short history of common statehood. A core tenet of the classic literature on nationalism by \citet{Deutsch1953}, \citet{Anderson2006}, and \citet{Gellner1983} is that state-building and nation-building go hand in hand, and that the continued existence of centralized governance institutions has played a significant role in shaping national identities in Europe \citep[p. 13]{Posner2004b}. Hence, in countries with a long history of statehood, we would generally expect stronger national identities and less social heterogeneity, which complicates political mobilization along sub-national ethnic delimiters. Accordingly, in young states, national identities in will be relatively weak, and local identities more readily available for political mobilization.

\section{Research Design and Data}
\label{Sec:4}

I test the hypothesized relationships on the basis of a cross-sectional statistical analysis of geographically defined grid-cells. Specifically, I divide the world's land-mass into quadrangular tiles with an edge length of 100 km, each covering approximately 10'000 square kilometers of surface area. These grid-cells serve as basic units of analysis.\footnote{For the creation of the equal-area grid-cell data set I proceeded as follows: In a first step, I used Gall-Peter's cylindrical equal area projection to divide the entire globe into square cells with 100 km edge-length and a surface of exactly 10'000 square kilometers. Second, I identified grid cells covering land mass via spatial intersection with the low-resolution GSHHS shoreline data (version 2.2.2, \citealt{Wessel1996}). I excluded the Antarctic, since it is fundamentally uninteresting for the subsequent analysis. Note that grid-cells intersecting with shorelines (rather than only landmass) may not actually cover 10'000 square kilometers of land-mass. In a third step, I assigned grid-cells to countries based on the international borders in effect on January 1st 2009, as identified by the CShapes data set (version 0.4-2, \citealt{Weidmann2010}). Assignment is based on a plurality rule: grid-cells are assigned to the country covering the highest share of the cell's total 10'000 square kilometers. Maritime claims are not counted for assignment. All spatial calculations were performed on a PostgreSQL 9.1 / PostGIS 1.5 data base; code is available upon request.} To test the core hypotheses proposed in the previous section, I map onto these grid-cells spatial data on productive petroleum fields and the presence and size politically relevant and territorially concentrated ethnic groups. The so-defined variables only refer to one particular point in time (2007 and 2009, respectively), thus making the analysis cross-sectional. To test hypothesis 1, I estimate whether grid-cells featuring petroleum production are more likely to cover a politically relevant, territorially concentrated ethnic group. To test the second hypothesis, I estimate whether among those grid-cells intersecting with the settlement area of politically relevant and territorially concentrated ethnic groups, petroleum production is associated with relatively smaller group size. That is, I test whether ethnic groups in petroleum producing areas encompass fewer members with respect to their host country's total population than other territorially concentrated groups. The two qualifying hypotheses concerning the restriction to ex-colonial states, and the conditioning effect of social heterogeneity, are tested using a split-sample, and appropriate interaction terms, respectively.

In the remainder of this section, I will discuss the reasoning underlying this particular research design, and elaborate on the specific data sources employed. The discussion of econometric specifications is postponed to the next section.

\subsection{Why grid-cells?}

Grid-cell based statistical analyses are an increasingly popular method in quantitative political science, and so far have been applied mainly in the domain of empirical conflict research (e.g., \citealt{Buhaug2006}, \citealt{Buhaug2011}, \citealt{Theisen2012}). I present three arguments in favor of employing grid-cells for testing the proposed hypotheses.

First, grid-cells are sub-national units of analysis in the sense that they allow capturing variance in a spatially defined phenomenon below the level of entire countries. This is important in the given context because the proposed hypotheses postulate a local relationship between petroleum extraction and the emergence of politically salient ethnic identities. 

Second, using grid-cells allows normalizing the surface area covered by each unit of analysis. Since I construct the employed grid-cell data set based on an equal-area projection, each grid-cell refers to (approximately) 10'000 square kilometers of surface area. This is of particular importance because the values of our main predictive variable, the presence of petroleum production in a given area, and our response variables, the presence and size of politically relevant ethnic groups in a given area, will be highly correlated with the spatial extent of the unit of analysis we are looking at. Thus, without appropriate measures to control for these associations, we may find a strong, but spurious, relationship between petroleum production and the size and presence of ethnic groups, driven entirely by the fact that we are more likely to find both petroleum and ethnic groups if we look at an ever larger stretch of territory. 

One way to prevent such bias is to include a measure of the units' surface area as a statistical control in the empirical analysis. However, this strategy requires us to correctly model the possibly nonlinear relationship between surface area and our response variables, and eats away valuable degrees of freedom. I would argue that normalizing the units of analysis' surface area is a more effective strategy to this end.

It is interesting to note that the fundamental positive relationship between measurements of the presence or quantity of petroleum (or any mineral resource) in a given location, and the surface area covered by the employed units of analysis is often ignored in the empirical ``resource curse'' literature. For instance, in the numerous studies linking countries' oil production, reserves, or exports to the outbreak of civil conflict, researchers rarely (if ever) control for the possibility that larger countries \emph{wil}l have greater oil reserves, and may see more territorial conflict simply because they cover more territory (e.g. \citealt{FearonLaitin2003}, \citealt{Humphreys2005}, \citealt{Ross2006}, \citealt{Lujala2010}, \citealt{Ross2012}).  The issue also extends to cell-based analyses where the grid-cell data set used is not based on an equal area projection, as is the case, for instance, with the popular PRIOGRID data set \citep{Tollefsen2012}.

The third argument in favor of using grid-cells is that the definition of the units of analysis is fully exogenous to the causal mechanisms under investigation (\citealt{Buhaug2011} make a similar claim). Due to their arbitrary definition, the presence and size of the equal-area grid-cells are fully independent of our main predictive variable, petroleum production. This claim would be more difficult to defend if we used some ``natural'' sub-national unit of analysis, for instance administrative units (as applied by, e.g., \citealt{Ostby2009}). It is not difficult to conceive of a mechanism that links petroleum production to the location and size of administrative units; hence, if we based our analyses on the latter, we would again encounter potential bias in our estimates.

Despite these arguments in favor of a grid-cell based analysis, there are noteworthy caveats. First, employing a sub-national, geographically defined unit of analysis will induce spatial dependence in statistical analysis. This issue will receive further discussion in the next section. However, it is important to mention that this problem is not unique to grid-cell based analyses, but would also apply if I used, e.g., first-level administrative units.  

A second problem associated with employing grid-cells is the well-known modifiable areal unit problem (MAUP), as discussed by \citet{Openshaw1983}. The key point raised by Openshaw is that using arbitrarily defined spatial units of analysis introduces substantial model uncertainty. That is, size and standard errors of the estimated coefficients may change as a function of the arbitrary specifications used to define the units of analysis. At the very least, this raises concerns about the robustness of the reported results; ideally, the conclusions drawn from statistical analysis should not be dependent on arbitrary modeling assumptions. At the worst, the MAUP introduces possibilities for the researcher to exploit modeling uncertainty to generate desired results. The best alternative to avoiding the MAUP would be to use non-arbitrary units of analysis. However, for the above stated reasons, I believe the grid-cell approach to be superior to the potentially endogeneity-inducing alternatives. Nevertheless, to mitigate the risk of making conclusions from non-robust findings, I (plan to) replicate all statistical models using a second grid-cell data set with smaller cells (with 50 km edge-length), and a different point of origin. Unfortunately, these results are still pending at the time, and will be included in future iterations of this paper.

A third caveat associated with grid-cells is that any substantial interpretation of the size of the estimated coefficients will be difficult. Though we can draw conclusions about the validity of our hypotheses from the sign of the estimated effects, it is difficult to put a probabilistic statement referring to a completely arbitrary geographical unit into a meaningful substantial context.

\subsection{Why cross-sectional?}
Optimally, I would test the proposed hypotheses using time-varying data, which would allow tracking the hypothesized process of petroleum production and the subsequent emergence of salient ethnic identities. However, instead, I perform a purely cross-sectional analysis, mainly due to feasibility constraints.

First, there is substantial uncertainty in the intertemporal component of the available data, which limits the value of employing time-series analysis. This is true in particular with regard to the PETRODATA data set \citep{Lujala2007}, from which I take data on the location of petroleum fields. Although PETRODATA is clearly an invaluable asset and represents the most comprehensive freely accessible data source on petroleum fields, its coding of fields' first production dates features a substantial fraction of missing values.  Of the total 892 onshore petroleum field polygons identified in PETRODATA, first production dates are missing for 475 observations. Hence, even if we adopted a panel framework for testing the proposed hypotheses, there would not be much temporal variance in the main independent variable. The same caveat also applies, in lesser form, to the ethnicity data from the EPR and GeoEPR data sets (by \citealt{Cederman2010} and \citealt{Wucherpfennig2011}, described in more detail below), which I use in this paper.  Though the latter data sets employ a time-variant coding of politically relevant ethnic groups, the exact dates assigned to the emergence of a salient ethnic identity must be interpreted with caution. Though the EPR coders have surely undertaken substantial efforts to make their codings as accurate as possible, the task of assigning a single year to the often long-winded process of the emergence of a politically salient ethnic identities remains challenging. 

Second, moving from a cross-section to a panel analysis would introduce substantial methodological challenges. In particular, we would have to address not just spatial, but also temporal (error-)dependence. In a ``standard'' econometric setting (e.g. normal errors, single equation estimation, no spatial dependence), doing so would be rather trivial. However, since we will have to perform statistical modeling under rather ``non-standard'' conditions, the additional requirement of having to correct for temporal dependence leads to significant additional complexity. 

Fortunately, despite the desirability of a time-variant analysis, there are good reasons to believe that we are able to test the proposed hypotheses even based on cross-sectional statistics. Since the presence of petroleum fields in a particular area can be assumed to be fairly exogenous with respect to the outcome under scrutiny, modeling intertemporal dynamics is not crucial for the purpose of inferring causality. In other words, if we find that petroleum production is associated with the ethnic salience, we may plausibly assume that the latter preceded the former.

\subsection{Data}

\subsubsection{Politically Relevant Ethnicity}

In order to test the postulated hypotheses, I require measures of the presence and (relative) size of politically relevant and territorially delimited ethnic groups in a particular geographic area. For this purpose, I use the EPR and GeoEPR data sets (versions 2.0). The EPR (Ethnic Power Relations) data set is an effort to identify all politically relevant ethnic groups and their access to state power, across the globe, for the period between 1946 and 2009 \citep{Cederman2010}. Groups are considered politically relevant ``if at least one significant political actor claims to represent the interests of that group in the national political arena, or if members of an ethnic category are systematically and intentionally discriminated against in the domain of public politics'' \citep[p. 2]{Vogt2011}. The GeoEPR data set is a spatial extension to EPR and provides information on ethnic groups' settlement type and area within a country \citep{Wucherpfennig2011}. In particular, for ethnic groups with territorially delimited settlement areas, which are of particular interest for the present analysis, GeoEPR provides geo-coded polygons identifying the latter.

Based on these data sets, I compile two variables on the grid-cell level that serve as response variables in the subsequent empirical analyses. First, I create a binary variable measuring the presence of a politically relevant and territorially concentrated ethnic group in a given grid-cell in the year 2009. I do so by performing a spatial intersection operation involving the appropriate GeoEPR polygons and the grid-cells. Second, based data provided in the original EPR data set, I create a variable measuring the relative group size of the smallest territorially concentrated ethnic group in a given grid-cell. Naturally, this variable is only defined for grid-cells where we see territorially concentrated and politically relevant ethnic groups in the first place, which introduces a number of econometric challenges discussed in the next section.

For a cartographic representation of the cell-level group size variable, see figures \ref{Map:App1} and \ref{Map:App3} in the appendix.

\subsubsection{Petroleum}

My primary source for data on the location of productive petroleum fields, that is, productive oil and natural gas fields, is the PETRODATA data set by \citet{Lujala2007}. PETRODATA provides geo-coded data indicating the location of petroleum fields on a global scale, which I map onto the grid-cell data set using a simple spatial intersection operation, thus creating a binary variable measuring the presence of petroleum fields in a given grid-cell. If a cell contains one or more petroleum fields, but none of these are coded by \citet{Lujala2007} as having been productive as of 2007, it is assigned a value of zero on the binary variable. Fields with unknown production status are assumed to be productive.

Figures \ref{Map:App2} and \ref{Map:App4} in the appendix display maps of the cell-level petroleum production dummy.

\subsubsection{Ex-colonial States}

To test the qualifying hypothesis stating that the effects postulated in hypotheses 1 and 2 only apply in ex-colonial developing countries, I create a sub-sample based on the three restrictions. I only include grid-cells in countries that have (1) once been an overseas colony of a European power, (2) that have gained independence in the 20th century, and (3) are located either in South-Sahara Africa or Asia. In accordance with the theoretical reasoning underlying the according hypothesis, these criteria define a sub-sample of relatively young polities which have been ruled by a colonial power for a significant period of time, and with little or no history of independent statehood prior to colonialization (see \citealt{Acemoglu2000} for a similar definition). The restriction to South-Sahara Africa and Asia primarily excludes Middle Eastern and North African countries, which have often spent only short time periods under European rule, and have frequently already been members of the state system prior to colonization, either as part of the Ottoman Empire, or as independent polities (e.g., Agleria, Tunisia, Morocco, Lybia, Persia (Iran), and Egypt).

Clearly, this definition of the ex-colonial sub-sample is debatable. Admittedly, it is based on somewhat inductive considerations; preliminary efforts to test the hypotheses using a less restrictive sub-sample of ex-colonial states have yielded much more inconclusive results than the ones using the above-given definition. Unfortunately, I have not yet had the chance to thoroughly investigate the origins of this discrepancy. This point will be developed in more detail in future iterations of the paper. 

\subsubsection{Linguistic Heterogeneity}

To test the second proposed qualifying proposition, I require a spatially disaggregated measure of the presence of social categories that may serve as the basis for ethnic identification. For this purpose, I use the GREG data set by \citet{Weidmann2010b}. The GREG data set is a digitized spatial representation of the ethnolinguistic groups listed in the Soviet Atlas Narodov Mira (AMS). The latter is the product of an effort by Soviet researchers in the early 1960s to compile a global atlas of ethnolinguistic groups.  The GREG data provide the possibility to perform at least a partial test of the social heterogeneity hypothesis: Since the AMS identifies ethnolinguistic groups regardless of political salience, we may use the GREG data to construct a spatially disaggregated proxy of potentially mobilizable ethnolinguistic differences. Clearly it would be desirable to have similar data for other potentially salient ethnic categories, such as religion and phenotype, but unfortunately such data is currently not available. Moreover, it should be noted that the AMS data contains a number of more or less clearly identifiable errors \citep{Posner2004}; however, since these are unlikely to be associated with the presence of petroleum production, they should not lead to biased inference, but only to greater uncertainty in the estimates.

Specifically, in order to generate a rough proxy of potentially mobilizable ethnic categories in a specific location, I overlap GREG polygons with the grid-cell data and calculate the relative demographic size of the smallest ethnolinguistic group per grid-cell. The relative demographic size of GREG groups is estimated using the method proposed by \citet{Weidmann2010b}, namely by overlapping the GREG polygons with GPW 3.0 GIS data on population density \citep{CIESIN2005}. Unfortunately, this method does not allow us to estimate GREG group size in the early 1960s, since the earliest iteration of the GPW 3.0 data refers to the year 1990. However, we may safely assume that relative population densities change slowly, thus minimizing the error caused by the temporal discrepancy.

In accordance with the proposed hypothesis, I expect petroleum extraction to lead the emergence of small politically relevant ethnic groups only if the newly created variable takes on low values; that is, I expect that the emergence of salient ethnic identification due to petroleum extraction only occurs only in areas with sufficient linguistic heterogeneity to distinguish local residents from outsiders.

\subsubsection{Other Controls}

Finally, I employ four control variables in order to address concerns about potentially spurious relationships.

First, although in hypotheses 1 and 2 we postulate a local effect of petroleum production on ethnic salience, it is very well possible that the actual relationship runs purely through indirect channels. That is, there might be an effect of country-level petroleum production on the emergence of politically relevant ethnic groups in general, regardless of the particular location of the extractive industry. In fact, this type of country-level effect is what we might expect given FCC's hypothesis that political ``pork'' increases ethnic coalition building in general. To ensure that our local, cell-level petroleum variable does not simply pick up such indirect effects, I propose to include an aggregate binary variable indicating petroleum production at the country-level as a statistical control. Unfortunately, this step necessitates the inclusion of yet another control variable: the surface area of a given cell's host country. Clearly, the presence of country-level petroleum production is positively correlated with a given country's geographical extent; similarly, the presence of salient ethnic identities, and their relative demographic size in particular, may be associated with a country's size. Hence, all models including the country-level petroleum control also include a variable measuring countries' surface area.\footnote{The variable used in the subsequent statistical models measures surface area linearly. I have also tested a log-transformed variable, which has proven to be a poorer fit to the data.}

Second, although whether we observe petroleum production in a particular location is largely determined by geology, it is also a function of the probability of reserves being discovered and extraction being considered profitable. The latter factors, in turn, are likely to be correlated with levels of economic development. Oil and gas exploration, as well as the establishment of a viable extractive industry are (human-)capital intensive activities, that are more easily conducted in economies with a well-functioning capital market, a well-established industrial sector, and a well-educated workforce. Because the presence of politically salient ethnic cleavages is also a phenomenon that is particularly frequent in developing countries \citep{Posner2004b}, I use a variable measuring GDP per capita levels in 1965 from \citet{Hunziker2012} to control for this relationship. Why use data from 1965 and not 2009? Since income levels not only affect petroleum production through the just discussed mechanisms, but petroleum production will also straightforwardly affect income levels, controlling for GDP per capita in 2009 will introduce massive collinearity issues, even if there is an independent causal affect by petroleum extraction on ethnic salience. The reasoning behind the use of GDP per capita data from 1965 is that this variable will be able to identify those low-income countries where petroleum discovery and extraction is less likely due to the above-stated reasons, but is less affected by the reverse effects of petroleum revenue on income levels. The latter argument follows from the fact that the emergence of petrostates, where gross domestic output is largely determined by petroleum revenue, took place primarily in the oil boom of the 1970s and the subsequent decades. The choice for 1965 is a compromise between this consideration, and the fact that earlier data is available only for an ever smaller sample of independent countries.

Third, I control for cell-level population in 1990 using data from the GPW 3.0 data set \citep{CIESIN2005}, which provides geo-coded information on population density for the entire globe. The logic underlying this decision is to ensure that the hypothesized effect of petroleum production on the formation of ethnic coalitions does not work exclusively through purely demographic processes. One could argue that petroleum production simply leads to in-migration into previously uninhabited areas, thus only providing the most basic precondition for the emergence of salient ethnicity: humans. This could lead to a small but positive linear association between petroleum production and the presence of politically relevant ethnic groups, even though the two are linked only via a very weak causal process. Including the population variable is intended to control for this possibility.

Finally, note that all variables used in the subsequent statistical analyses are summarized in table \ref{Tab:App1} in the appendix.





\section{Econometric Specification}
\label{Sec:5}

This section specifies and discusses the econometric models applied to test the postulated hypotheses. 

Unfortunately, the specified research design does not allow the estimation of what one may consider ``standard'' cross-sectional regression models. Rather, two potential issues deserve special attention. 
First, due to the employment of geographically defined grid-cells as units of analysis, I will have to address the issue of spatial dependence. Second, due to the partial observability of the second main outcome variable -- the size of territorially concentrated and politically relevant ethnic groups in a particular location -- I will have to discuss modeling choices for non-random sample selection.

\subsection{Spatial Dependence}

Grid-cell level measurements of the presence and size of territorially concentrated ethnic groups do not represent independent observations. Rather, these variables will show strong spatial dependence; i.e., a cell's value on one of these variables is strongly associated with nearby cells' values. This is the case because we divide a continuous spatial phenomenon (settlement patterns of territorially concentrated ethnic groups) into discrete units, with one group polygon potentially covering multiple cells. 

In a regression context, spatial dependence may give rise to two consequential deviations from “standard” modeling assumptions. First, there may be diffusion effects where a given cell's value on the outcome variable directly affects its neighboring cells' values on the outcome variable. Second, there will be spatial error dependence violating the \emph{iid} assumptions usually underlying the estimation of standard errors. 

In the domain of spatial econometrics, there are two well-established strategies to address these issues\citep{Ward2008}:
\begin{itemize}
	\item \emph{Spatial-lag models} address both diffusion effects and error dependence in a single framework. In their linear additive representaiton, spatial-lag models are specified as follows:
	\[ y = \rho \boldsymbol{W} y + \boldsymbol{X} \beta + \epsilon,\]
	whereas $y$ is the response vector, $\boldsymbol{X}$ is a matrix of predictors (including a constant) with associated coefficient vector $\beta$, and $\epsilon$ is an \emph{iid} error term satisfying the usual assumptions. The spatial component of the model is represented by the first term of the right-hand-side of the equation, which links a given cell's response value to all other observations' outcomes via a spatially defined connectivity matrix $\boldsymbol{W}$, and the spatial-lag parameter $\rho$, whereas the latter estimates the degree of spatial diffusion. It is straightforward to see that the spatial-lag specification allows diffusion effects, with neighboring cells (as defined by $\boldsymbol{W}$)  directly affecting the given cell's outcome value. 
	\item In contrast to the spatial-lag model, the \emph{spatial error} model only addresses error dependence, and does not take into account diffusion effects. In other words, the spatial error model assumes that the modeled regressors $X$ only have a local effect on the outcome, and do not affect neighboring cells. Only the error term is assumed to be spatially dependent, which is tantamount to saying that only unmodeled determinants of the outcome have non-local effects. This translates to the following specification:
	\[ y =  \boldsymbol{X} \beta + \lambda \boldsymbol{W} \xi + \epsilon,\]
	whereas here the spatial component is represented by $\lambda \boldsymbol{W} \xi$. In the spatial error model, the error term is decomposed into $\epsilon$, which is again \emph{iid}, and $\xi$, which is the spatially dependent error component that affects a given cell's response value through an appropriately defined connectivity matrix $\boldsymbol{W}$ and the spatial-dependence parameter $\lambda$.  
\end{itemize}

In theoretical terms, the spatial-lag model would be the most preferable option for estimating the relationships under scrutiny, due to its ability to capture diffusion effects. It is straightforward to assume that petroleum in one cell will affect the presence of a politically salient ethnic group in a neighboring cell, not least because the two cells may in practice be covered by the same group polygon. Unfortunately, for reasons discussed below, estimating either of the above-mentioned spatial models demands measures that are beyond the scope of this paper. Instead, I pursue a framework where I discard possible diffusion effects, and correct for spatial error dependence through country-wise clustering.

Why do I pursue this second-best approach? Estimation techniques for the spatial-lag model and the spatial error model are well-developed for settings with a normal or binary response, a single estimating equation, and a well-defined neighborhood structure for specifying the connectivity matrix \citep{Ward2008}. In the present analysis we are not given these luxuries. As will be discussed in the next subsection, due to concerns about non-random sample selection in the second main outcome variable, we will have to employ double-equation selection models. Though there is some recent literature addressing selection models with a spatial error component \citep{Flores2012}, these types of models are not yet well-established and have yet to be implemented in statistical software. 

Second, there are computational limits; even if I were able to derive a spatial-lag model that allows for sample selection, estimating binary response spatial-lag models (which would be necessary for modeling the sample selection stage) is computationally extremely intensive, even for moderate numbers of observations \citep{Ward2002}, and would be prohibitively costly in the context of the approximately 13'000 grid-cells used in the present analysis. 

Finally, appropriately specifying the spatial dependence structure in the grid-cell data would be very challenging. Though one might expect spatial dependence to be strongest in the geographical vicinity of a given cell. thus calling for geographical weights in the connectivity matrix, there might also be country-wise dependence that is unrelated to the euclidean distance between cells.

What are the econometric implications of merely clustering standard errors, instead of explicitly modeling the spatial dependence structure in the data? There are two parts to the answer of this question. First, there is the fact that I discard of potential diffusion effects, and estimate the relationship between petroleum and the two outcome variables as if the latter only affected the former locally. This potential misspecification will cause bias in the sense that the estimated point estimates will lump together spatial feedback effects and short-term local effects (see \citealt[p. 44]{Ward2008} for more information on this distinction). Note, however, that this type of bias will not lead to false positives with respect to the effect of petroleum (or any other predictor variable) on ethnic salience. Even if we fail to distinguish between local and diffusion effects, the expected value of the respective coefficient will be zero if petroleum has no actual effect on the outcome of interest. See section \ref{Sec:NoteSpat} in the appendix for a formalized illustration of this argument.

Second, we need to consider the consequences of waiving the explicit specification of spatial error dependence, and instead estimating point estimates on the basis of an (incorrect) \emph{iid} assumption, and adjusting standard errors by clustering on countries. To understand the consequences of this simplification, it is important to note that for a broad class of models, not correctly specifying error dependence (of any kind, not just spatial) will only lead to biased standard error estimates, not biased point estimates. In fact, one can estimate any additive-normal spatial error model with OLS and obtain unbiased point estimates \citep[p. 66]{Ward2008}. Hence, estimation on the basis of a (faulty) \emph{iid} assumption should not induce additional bias in point estimates, but only yield incorrect standard errors. So how does using country-wise clustering instead of explicitly modeling error dependence affect standard errors? In fact, this approach will likely yield standard error estimates that are too conservative (i.e., too large). Clustering on countries essentially assumes that the only independent variance we observe in our data is across countries. Thus, for the purpose of estimating standard errors, all within-country variance is discarded. Consequently, the only independence assumption we require when we cluster on countries is that the presence and size of politically salient ethnic identities is unrelated across countries, which seems fairly reasonable. If we instead specified a spatial error model with a connectivity matrix capturing local dependencies, we would utilize at least some within-country variance for the purpose of statistical inference, thus applying a less conservative error dependence assumption than when clustering on countries. Hence, under most circumstances (specifically, if within-cluster error correlation is positive), country-wise clustering will yield larger, more conservative standard-errors than explicitly modeling spatial error dependence.

Finally, it is worth noting that the discussion of whether it is appropriate to use clustering in lieu of properly defining the spatial error structure of the data is far more nuanced than I was able to account for in this brief exposition. For this reason, I provide a more detailed discussion in section \ref{App:Cluster} of the appendix.

\subsection{Nested Outcomes}

A second econometric issue that requires due attention arises from the nested structure of our two outcome variables. Specifically, whether we can observe the demographic size of a politically relevant ethnic group inhabiting the area covered by a given grid-cell is conditional on observing a group in the first place.
How this pair of outcome variables should be modeled, and whether they should be modeled jointly at all, depends on the assumptions we attribute to the data generating processes underlying them. In particular, appropriate model specification is a function of the degree to which the process determining whether a grid-cell is covered by an ethnic group can be assumed to be separate from the process determining said ethnic group's relative size.

\subsubsection{Conditional Independence}

To elaborate on this argument and illustrate its econometric consequences, let us start with the simpler, yet more restrictive of the two possible assumptions: That the two data generating processes are conditionally independent.

What does this mean substantially? Given the modeled covariates, the process determining the presence of a politically relevant territorial ethnic group in a given region of a country is assumed to be independent of the process determining the relative size of that group. In other words, we assume that no variables, except the ones we decide to model explicitly, affect both the presence and the size of politically relevant and territorially concentrated ethnic groups in a given location.

Even in the context of the simple theoretical framework presented in this paper, this assumption is implausible. Note that above, I explicitly argue that petroleum influences both processes: it makes mobilization along ethnic delineations more likely, and it should lead to politically salient identities encompassing relatively few individuals. Clearly, it is straightforward to assume that there are numerous other factors for which we could make very similar arguments; at the very least, other territorially concentrated sources of state revenue should have similar effects. Since we do not measure or model these other factors, and assuming that the hypotheses postulated above have at least some empirical merit, the conditional independence assumption would hence be violated. 

Although conditional independence is an implausible assumption, it deserves attention because it gives way to the simplest possible modeling strategy for our pair of nested outcome variables, the hurdle model (see \citealt[ch. 17]{Wooldridge2002}). Since the hurdle model, in contrast to the subsequently discussed selection model, has very desirable convergence properties, it will serve as a baseline for our empirical analyses.

For our present application, we define the following log-normal hurlde model. Let the first outcome variable, indicating the presence of politically salient ethnicity in a given cell, be determined as follows:
\begin{align*}
	y_1^{*} = \boldsymbol{Z}\gamma + \epsilon_1 \\
	y_1 =
	\begin{cases}
   		1 & \text{if } y_1^{*} \geq 0 \\
		0       & \text{else}
 	\end{cases}
\end{align*}
whereas $\boldsymbol{Z}$ is a matrix of covariates (including a constant), $\gamma$ is the accompanying coefficient vector, and $\epsilon_1 \sim N(0, 1)$. This is the textbook latent-variable Probit specification. 

Further, let the second outcome variable, indicating the relative group size of the smallest territorially concentrated ethnic group inhabiting a given cell's surface area, be specified as follows:
\begin{align*}
	y_2 =
	\begin{cases}
   		exp(\boldsymbol{X}\beta + \epsilon_2) & \text{if } y_1 = 1 \\
		0       & \text{else}
 	\end{cases}
\end{align*}
Again, $\boldsymbol{X}$ is a covariate matrix, $\beta$ is the accompanying coefficient vector, and $\epsilon_2 \sim N(0, \sigma)$. Clearly, $y_2$ is only well-defined in cells where we observe at least one politically relevant and territorially concentrated ethnic group, hence we arbitrarily set it to zero where this is not the case. If at least one group is observable, we specify a log-normal relationship between the outcome and the linear predictor. We do so for two reasons: First, this specification ensures that $\hat{y}_2 > 0$, which is sensible because observed group sizes are strictly positive. Second, this is appropriate because the cell-level relative group-size variable exhibits strong positive skew, which is conveniently captured using the log-normal distribution.

Specifying the likelihood of the observed outcomes $y_1$ and $y_2$ is straightforward given the above-stated expressions:
\[ L = \prod_{i=1}^N Pr(y_{1i} = 0)^{d_i} * \left[ Pr(y_{1i} = 1) f(ln(y_{2i}) | y_{1i} = 1) \right]^{(1 - d_i)},\]
whereas $d_i \in [0,1]$ indicates whether $y_{2i}$ is unobservable (and thus zero), and $f$ is the conditional pdf of $y_2$. Because we assume conditional independence between the two processes generating $y_1$ and $y_2$, the conditional distribution $f(ln(y_{2i}) | y_{1i} = 1)$ simplifies to the marginal distribution $f(ln(y_{2i}))$, which we have assumed to be normal. Thus, taking logs and spelling out the underlying probability distributions yields the following log-likelihood for the data:
\[ ln(L) = \sum_{i=1}^N d_i * ln(\Phi(-z_i \gamma)) + (1 - d_i) *  ln(\Phi(z_i \gamma)) + (1-d_i) * ln(\phi \left(\frac{ln(y_{2i}) - x_i \beta} {\sigma} \right)), \]
with $\Phi$ and $\phi$ representing the standard normal cdf and pdf, respectively.

It is interesting to note that all terms involving either $\beta$ or $\gamma$ are additively separable. Hence, the two coefficient vectors can be fit independently; that is, they may be estimated without knowledge of $\boldsymbol{X}$ and $\boldsymbol{Z}$, respectively. In practical terms, this implies that maximizing the above stated log-likelihood yields the same estimates as fitting a Probit and a log-normal regression to $y_1$ and $y_2$ independently. 

\subsubsection{Correlated Errors}

As mentioned above, asssuming conditional independence between the two involved data generating processes is implausible. Instead, it seems more approriate to assume that there exist a multitude of factors that determine both whether we see a politically relevant ethnic group in a specific location, and the demographic size of that group. 

Under this assumption, the most suitable modeling strategy is the use of selection models, as originally proposed by \citep{Heckman1979}. Selection models represent dependence between the ``participation'' equation (is there an ethnic group?) and the ``amount'' equation (how large is the group?) by allowing correlation between the respective error terms. Specifically, the standard log-normal selection model is characterized as follows (see \citealt[ch. 17]{Wooldridge2002}).

The basic characterization of the equations determining the two outcomes is identical to the hurdle model, and will thus not be repeated. However, instead of assuming $\epsilon_1$ and $\epsilon_2$ being independent, we assume that they follow a 2-dimensional multivariate normal distribution with $E(\epsilon_1) = E(\epsilon_2) = 0$ and variance-covariance matrix 
$VC= \left( \begin{array}{cc}
1 & \rho \sigma \\
\rho \sigma & \sigma^2 
\end{array} \right) $,
whereas $\rho$ represents the Pearson correlation coefficient for the two error terms.

Based on these assumptions, the likelihood for the data may again be specified as follows:
\[ L = \prod_{i=1}^N Pr(y_{1i} = 0)^{d_i} * \left[ Pr(y_{1i} = 1) f(ln(y_{2i}) | y_{1i} = 1) \right]^{(1 - d_i)}.\]
However, because now we assume a specific dependence structure between $\epsilon_1$ and $\epsilon_2$, the conditional distribution $f$ does not collapse to its univariate normal marginal. Rather, considering the multivariate normal distribution of the errors, the likelihood becomes
\[ L = \prod_{i=1}^N \Phi(-z_i \gamma)^{d_i} * \left[ \frac{1}{\sigma_2} \phi \left( \frac{ln(y_{2i}) - x_i \beta}{\sigma_2}\right) \Phi \left(  \frac{z_i \gamma + \rho (ln(y_{2i}) - x_i \beta) / \sigma_2}{\sqrt{1 - \rho^2}} \right) \right]^{(1 - d_i)},\]
see Amemiya (1985: ch. 10).
Hence, now we estimate an additional parameter $\rho$, which measures the degree to which unmeasured determinants of whether we see a politically relevant ethnic group in a given grid-cell are correlated with unmeasured determinants of the size of these groups. 

As shown by \citet{Heckman1979}, ignoring non-zero dependence between the two error terms leads to biased estimates for censored outcome. Hence, given the fact that assuming conditional independence is improbable in the current context, it seems natural to conclude that we should rely exclusively on selection models. This point is reinforced by the fact that the log-normal hurdle model is actually nested in the log-normal selection model: if the estimated correlation coefficient $\rho$ is zero, the selection model simplifies to the hurdle model. Unfortunately, however, selection models have significant caveats. In particular, if $\boldsymbol{X}$ and $\boldsymbol{Z}$ consist of the same set of explanatory variables, the model is only weakly identified, and maximum likelihood estimation frequently fails to converge \citep[p. 698]{Wooldridge2002}. Hence, for robust identification, we require at least one regressor that affects only one of the two equations. In the present context, this would imply identifying a variable that affects only the presence of a politically relevant ethnic group, but not its size (or the other way round). Unfortunately, I am not able to identify a variable that fulfills this exclusion criterion.

For these reasons, I adopt the strategy of consistently estimating both the hurdle- and the selection-specification in the subsequent empirical analyses. On the one hand, this strategy ensures that I realistically report, and deliberately highlight the extent of model uncertainty associated with the present analysis. Doing so is of key importance to prevent premature conclusions, and to guard against the possibility of ``fishing'' \citep{Humphreys2013}. On the other hand, given that we know about the caveats associated with the two modeling approaches, reporting the results obtained with both may enable us to use the combined evidence to draw more robust conclusions.

\subsection{Overview of Estimated Models}

Having addressed the main statistical challenges associated with the research design at hand, I will now briefly summarize the econometric strategy pursued.

In essence, I estimate four types of models, which are characterized by the variables included in estimation:

\begin{itemize}
	\item \emph{Model 1} A barebone model only including the cell-level petroleum dummy as a predictor variable.
	\item \emph{Model 2} A model including the country-level petroleum and country-level surface-area controls, targeted at identifying whether there is evidence of petroleum-fields having a local, rather than only an indirect effect on ethnic salience.
	\item \emph{Model 3} A model including all suggested control variables, to control for potentially spurious relationships.
	\item \emph{Model 4} A model where both (cell- and country-level) petroleum variables are interacted with the GREG measure of ethnolinguistic heterogeneity, in order to test the second conditional hypothesis.
\end{itemize}

In all these models I simultaneously estimate the effects of the predictors on the presence and the size of territorially concentrated and politically relevant ethnic groups.\footnote{All estimation is performed in R using the \emph{maxLik} package and likelihood functions specified by the author.} Each model is estimated twice: Once with under the conditional independence assumption (i.e., using the log-normal hurdle specification), and once where I allow correlated error terms (i.e., using the log-normal selection model). After estimation, the fitted models' variance-covariance matrices are reestimated on the basis of country-wise clustering (see section \ref{App:Cluster} in the appendix for details). Finally, to investigate the first qualifying hypotheses concerning the type of countries where we would expect the postulated hypotheses to matter most, I estimate all models on two samples: A global sample, and a sample restricted to ex-colonial states, as defined in section \ref{Sec:4}.




\section{Results}
\label{Sec:6}

\subsection{Global Sample}

Tables \ref{Tab:Res1} and \ref{Tab:Res2} show the estimates obtained on the basis of the global grid-cell sample. The top rows show the estimates for the binary  group-presence response, and the bottom rows show the estimates for the (logged) relative group size response. $\hat{\sigma}$ is the estimated dispersion parameter for the second equation.

Other than announced in the previous section, no selection model estimates are shown. This is the case because none of the maximum likelihood estimation procedures for the selection models converged to a global maximum. All maximization routines ended at non-concave areas of the log-likelihood, thus producing non-sensical estimates.\footnote{Estimation was attempted in R, using the \emph{maxLik} package, and Stata 12, with various maximization methods; to no avail.} The results obtained with the hurdle specification, reported in tables \ref{Tab:Res1} and \ref{Tab:Res2}, help to explain why this is the case: there is virtually no signal in the data. None of the petroleum-related variables achieve statistical significance. In fact, the only variables with a p-value smaller than 5\% are country-level surface area, GDP per capita, and the GREG-based ethnolinguistic group size variable. Incidentally, all these variables' effects point in the anticipated direction.

Since none of the substantially interesting coefficients in the global sample models are distinguishable from zero at any standard level of significance, I waive the calculation of  first differences, or any substantially meaningful quantities. Though in principle we cannot infer the significance levels of the conditional effects specified in model 4 based on the standard errors of the constitutive terms \citep{Aia2003}, the respective z-values are so small that we can safely assume that there is no underlying significant effect.

\singlespacing
\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{lrllrlr}
\hline
 & \multicolumn{ 3}{c}{Model 1} & \multicolumn{ 3}{c}{Model 2} \\ 
 & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Hurdle}} \\  \hline
\textit{Presence of Group} & \multicolumn{1}{l}{} &  &  & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} \\ 
Petroleum & 0.03 &  & \multicolumn{1}{r}{\textit{0.22}} & 0.07 &  & \textit{0.24} \\ 
Petroleum (Country) & \multicolumn{1}{l}{} &  & \textit{} & -0.38 &  & \textit{0.25} \\ 
Country Area (mio. Sqkm) & \multicolumn{1}{l}{} &  & \textit{} & 0.02 &  & \textit{0.02} \\ 
Constant & 0.64 & ** & \multicolumn{1}{r}{\textit{0.19}} & 0.89 & ** & \textit{0.18} \\  \hline
\textit{Relative Group Size (logged)} & \multicolumn{1}{l}{} &  & \textit{} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} \\ 
Petroleum & 0.19 &  & \multicolumn{1}{r}{\textit{0.19}} & 0.17 &  & \textit{0.19} \\ 
Petroleum (Country) & \multicolumn{1}{l}{} &  & \textit{} & 0.12 &  & \textit{0.38} \\ 
Country Area (mio. Sqkm) & \multicolumn{1}{l}{} &  & \textit{} & -0.09 & ** & \textit{0.02} \\ 
Constant & -2.62 & ** & \multicolumn{1}{r}{\textit{0.31}} & -2.16 & ** & \textit{0.24}  \\ \hline
$\hat{\sigma}$  & 2.08 &  &  & 2.01 &  & \multicolumn{1}{l}{} \\ 
Pseudo logLik & -28521.61 &  &  & -28160.49 &  & \multicolumn{1}{l}{} \\ 
N & 13177 &  &  & 13177 &  & \multicolumn{1}{l}{} \\ \hline \hline 
\end{tabular}
\caption{Estimation results, global sample. Standard errors clustered on 161 countries in \textit{italic}. Signif. codes:  0.01 $**$, 0.05 $*$. }
\label{Tab:Res1}
\end{table}
\onehalfspacing

\singlespacing 
\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{lrlrrll}
\hline
 & \multicolumn{ 3}{c}{Model 3} & \multicolumn{ 3}{c}{Model 4} \\ 
 & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Hurdle}} \\ \hline
\textit{Presence of Group} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \textit{} \\ 
Petroleum & -0.02 &  & \textit{0.14} & 0.20 &  & \multicolumn{1}{r}{\textit{0.20}} \\ 
Petroleum (Country) & -0.49 &  & \textit{0.40} & -0.43 &  & \multicolumn{1}{r}{\textit{0.36}} \\ 
Country Area (mio. Sqkm) & -0.09 & ** & \textit{0.02} & \multicolumn{1}{l}{} &  & \textit{} \\ 
Population (logged) & 0.06 &  & \textit{0.07} & \multicolumn{1}{l}{} &  & \textit{} \\ 
GDPpc 1965 (logged) & 0.45 & * & \textit{0.23} & \multicolumn{1}{l}{} &  & \textit{} \\ 
GREG min. rel. Size & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -1.31 & ** & \multicolumn{1}{r}{\textit{0.50}} \\ 
GREG$\times$ Petroleum & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -0.14 &  & \multicolumn{1}{r}{\textit{0.21}} \\ 
GREG $\times$  Cntr. Petr. & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & 0.45 &  & \multicolumn{1}{r}{\textit{0.59}} \\ 
Constant & 2.43 & ** & \textit{1.40} & 1.31 &  & \multicolumn{1}{r}{\textit{0.24}} \\  \hline
\textit{Relative Group Size (logged)} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \textit{} \\ 
Petroleum & 0.18 &  & \textit{0.20} & 0.19 &  & \multicolumn{1}{r}{\textit{0.20}} \\ 
Petroleum (Country) & -0.28 &  & \textit{0.31} & -0.67 &  & \multicolumn{1}{r}{\textit{0.48}} \\ 
Country Area (mio. Sqkm) & 0.02 &  & \textit{0.03} & \multicolumn{1}{l}{} &  & \textit{} \\ 
Population (logged) & -0.02 &  & \textit{0.05} & \multicolumn{1}{l}{} &  & \textit{} \\ 
GDPpc 1965 (logged) & -0.19 &  & \textit{0.14} & \multicolumn{1}{l}{} &  & \textit{} \\ 
GREG min. rel. Size & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & 1.52 & ** & \multicolumn{1}{r}{\textit{0.51}} \\ 
GREG $\times$  Petroleum & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -0.52 &  & \multicolumn{1}{r}{\textit{0.49}} \\ 
GREG $\times$  Cntr. Petr. & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & 0.94 &  & \multicolumn{1}{r}{\textit{0.87}} \\ 
Constant & -5.87 & ** & \textit{1.91} & -2.49 &  & \multicolumn{1}{r}{\textit{0.29}} \\  \hline
$\hat{\sigma}$  & 2.00 &  & \multicolumn{1}{l}{\textit{}} & 1.92 &  &  \\ 
Pseudo logLik & -25454.62 &  & \multicolumn{1}{l}{} & -26845.28 &  &  \\ 
N & 12012 &  & \multicolumn{1}{l}{} & 12949 &  &  \\  \hline \hline
\end{tabular}
\caption{Estimation results, global sample. Standard errors clustered on 113 (Model 3) / 161 (Model 4) countries in \textit{italic}. Signif. codes:  0.01 $**$, 0.05 $*$. }
\label{Tab:Res2}
\end{table}
\onehalfspacing

\subsection{Ex-Colonial Sample}

Now let us move from the global sample to the sample based on 47 ex-colonial states in South-Saharan Africa and Asia. As discussed in section \ref{Sec:3}, I expect the core hypotheses to be most likely to apply in these cases.

Table \ref{Tab:Res3} shows the hurdle- and selection-model estimates for the two simplest model specifications; the bare-bone model including only the cell-level petroleum dummies, and model 2, in which we control for country-level petroleum production.

Given these simple model specifications, the cell-level petroleum dummy clearly seems to have a positive and significant effect on the probability of observing a territorially concentrated and politically relevant ethnic group in a particular location. 

Moreover, in three out of the four estimated models cell-level petroleum features a negative effect on group size. Hence, not only are we more likely to see politically relevant ethnic groups in petroleum-rich areas; those groups that we observe also tend to be particularily small with respect to their host country's total population. Interestingly, there is also some tentative evidence suggesting that country-level petroleum seems to have similar effect on relative group sizes, as evident in the selection specification of model 2.

Furthermore, another noteworthy result is that the selection models' $\rho$ parameters seem to imply strong and significant negative error-term correlation. Substantially, this means that unobserved factors that increase likelihood of seeing a politically relevant ethnic group in a given location also tend to lead to smaller group sizes.  Hence, it seems that these unobserved factors generally have a similarly behaved effects as the one I postulate for petroleum.

\singlespacing 
\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{llllrllrlrrll}
 \hline
 & \multicolumn{ 6}{c}{Model 1} & \multicolumn{ 6}{c}{Model 2} \\ 
 & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Selection}} & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Selection}} \\  \hline
\textit{Presence of Group} &  &  & \textit{} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{} &  & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{} &  \\ 
Petroleum & \multicolumn{1}{r}{0.75} & ** & \multicolumn{1}{r}{\textit{0.18}} & 0.69 & ** & \multicolumn{1}{r}{\textit{0.18}} & 0.95 & ** & \textit{0.17} & 0.86 & ** & \multicolumn{1}{r}{\textit{0.22}} \\ 
Petroleum (Country) &  &  & \textit{} & \multicolumn{1}{l}{} &  & \textit{} & -0.63 &  & \textit{0.36} & -0.49 &  & \multicolumn{1}{r}{\textit{0.32}} \\ 
Country Area (mio. Sqkm) &  &  & \textit{} & \multicolumn{1}{l}{} &  & \textit{} & 0.43 & * & \textit{0.20} & 0.22 &  & \multicolumn{1}{r}{\textit{0.22}} \\ 
Constant & \multicolumn{1}{r}{0.96} & ** & \multicolumn{1}{r}{\textit{0.16}} & 0.96 & ** & \multicolumn{1}{r}{\textit{0.16}} & 0.85 & ** & \textit{0.27} & 1.01 & ** & \multicolumn{1}{r}{\textit{0.25}} \\  \hline
\textit{Relative Group Size (logged)} &  &  & \textit{} & \multicolumn{1}{l}{} &  & \textit{} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \textit{} \\ 
Petroleum & \multicolumn{1}{r}{-0.51} & * & \multicolumn{1}{r}{\textit{0.23}} & -0.76 & ** & \multicolumn{1}{r}{\textit{0.25}} & -0.23 &  & \textit{0.19} & -0.64 & ** & \multicolumn{1}{r}{\textit{0.19}} \\ 
Petroleum (Country) &  &  & \textit{} & \multicolumn{1}{l}{} &  & \textit{} & -0.72 & * & \textit{0.33} & -0.33 &  & \multicolumn{1}{r}{\textit{0.39}} \\ 
Country Area (mio. Sqkm) &  &  & \textit{} & \multicolumn{1}{l}{} &  & \textit{} & -0.21 & * & \textit{0.10} & -0.47 & ** & \multicolumn{1}{r}{\textit{0.15}} \\ 
Constant & \multicolumn{1}{r}{-2.49} & ** & \multicolumn{1}{r}{\textit{0.17}} & -2.13 & ** & \multicolumn{1}{r}{\textit{0.27}} & -1.71 & ** & \textit{0.23} & -1.17 & ** & \multicolumn{1}{r}{\textit{0.26}} \\  \hline
$\hat{\sigma}$  & \multicolumn{1}{r}{1.29} &  & \textit{} & 1.48 &  & \textit{} & 1.20 &  & \multicolumn{1}{l}{\textit{}} & 1.49 &  & \textit{} \\ 
$\hat{\rho}$ &  &  & \textit{} & -0.80 & * & \textit{} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -0.99 & ** & \textit{} \\ 
Pseudo logLik & \multicolumn{1}{r}{-5204.13} &  & \textit{} & -5192.34 &  & \textit{} & -4959.61 &  & \multicolumn{1}{l}{\textit{}} & -4860.68 &  & \textit{} \\ 
N & \multicolumn{1}{r}{2827} &  & \textit{} & 2827 &  & \textit{} & 2827 &  & \multicolumn{1}{l}{} & 2827 &  & \textit{} \\  \hline  \hline
\end{tabular}
\caption{Estimation results, ex-colonial sample. Standard errors clustered on 47 countries in \textit{italic}. Signif. codes:  0.01 $**$, 0.05 $*$. }
\label{Tab:Res3}
\end{table}
\onehalfspacing

Table \ref{Tab:Res3} shows the results of estimating the hurdle- and selection-model specifications with the additional control variables. Evidently, the addition of GDP per capita in 1965 and cell-level population does not change the petroleum-related results substantially. \footnote{Though, due to missing data from those countries that have not yet gained independence in 1965, the number of cases is smaller.} Cell-level petroleum still features a clearly identifiable positive effect on the probability of observing a politically relevant ethnic group in a given location. Again, the effect of local petroleum production on group size is slightly more ambiguous, with the hurdle specification indicating no significant effect, and the selection specification suggesting the contrary.

\singlespacing 
\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{lrlrrlr}
 \hline
 & \multicolumn{ 6}{c}{Model 3} \\ 
 & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Selection}} \\   \hline
\textit{Presence of Group} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{1}{l}{} \\ 
Petroleum & 0.71 & ** & \textit{0.12} & 0.79 & ** & \textit{0.14} \\ 
Petroleum (Country) & -0.64 &  & \textit{0.43} & -0.66 &  & \textit{0.39} \\ 
Country Area (mio. Sqkm) & 0.41 &  & \textit{0.22} & 0.32 & * & \textit{0.14} \\ 
Population (logged) & 0.14 & * & \textit{0.07} & 0.14 & ** & \textit{0.05} \\ 
GDPpc 1965 (logged) & 0.11 &  & \textit{0.23} & -0.03 &  & \textit{0.24} \\ 
Constant & -1.34 & ** & \textit{1.59} & -0.40 &  & \textit{1.77} \\   \hline
\textit{Relative Group Size (logged)} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} \\ 
Petroleum & -0.21 &  & \textit{0.20} & -0.45 & * & \textit{0.19} \\ 
Petroleum (Country) & -1.21 & ** & \textit{0.31} & -0.91 & * & \textit{0.37} \\ 
Country Area (mio. Sqkm) & -0.16 &  & \textit{0.10} & -0.33 & * & \textit{0.14} \\ 
Population (logged) & 0.10 &  & \textit{0.06} & 0.04 &  & \textit{0.07} \\ 
GDPpc 1965 (logged) & 0.50 & * & \textit{0.20} & 0.42 &  & \textit{0.26} \\ 
Constant & -6.02 & ** & \textit{1.47} & -4.42 & * & \textit{1.96} \\   \hline
$\hat{\sigma}$ & 1.13 &  & \multicolumn{1}{l}{} & 1.34 &  & \multicolumn{1}{l}{\textit{}} \\ 
$\hat{\rho}$ & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} & -0.98 & ** & \multicolumn{1}{l}{\textit{}} \\ 
Pseudo logLik & -4317.32 &  & \multicolumn{1}{l}{} & -4219.66 &  & \multicolumn{1}{l}{\textit{}} \\ 
N & 2560 &  & \multicolumn{1}{l}{} & 2560 &  & \multicolumn{1}{l}{\textit{}} \\   \hline  \hline
\end{tabular}
\caption{Estimation results, ex-colonial sample. Standard errors clustered on 39 countries in \textit{italic}. Signif. codes:  0.01 $**$, 0.05 $*$. }
\label{Tab:Res4}
\end{table}
\onehalfspacing

In order to obtain an overview of all the results reported so far, consider figures \ref{Fig:Res1} and \ref{Fig:Res2}.
Figure \ref{Fig:Res1}  displays first differences in the probability of observing a politically relevant ethnic group in a given grid-cell when moving from a petroleum-free cell to one featuring petroleum production, for all three models reported so far. \footnote{For the calculation of all first difference effects reported from here on forth, all covariates have been set to their mean values. The only exception is of course the country-level petroleum dummy, which is set to unity when analyzing the effects of cell-level petroleum production.} Quite clearly, across all model specifications, we observe the hypothesized positive effect. 

\begin{figure}[h]
	\centering
		\includegraphics[scale=.75]{Plots/pdpet_wcsa_prsel.pdf}
	\caption{First differences in predicted probabilities as a function of cell-level petroleum. Based on estimates from tables \ref{Tab:Res3} and \ref{Tab:Res4}. Vertical bars show simulated 95\% confidence intervals.}
	\label{Fig:Res1}
\end{figure}

Figure \ref{Fig:Res2} shows first differences in expected relative group size in a given grid cell when we move from 0 to 1 on the cell-level petroleum dummy. Here, the evidence is less conclusive. While the selection specification consistently speaks in favor of the hypothesized negative effect, the hurdle models including controls imply a null result. In principle, we should give precedence to the selection model results. The clearly non-zero estimates for the respective $\rho$ parameters would imply that the hurdle model, based on the assumption of conditional independence, is misspecified. However, given that the selection models were estimated without an exclusion criterion, I suggest to refrain from strong conclusions. 

\begin{figure}[h]
	\centering
		\includegraphics[scale=.75]{Plots/pdpet_wcsa_eout.pdf}
	\caption{First differences in expected group size as a function of cell-level petroleum. Based on estimates from tables \ref{Tab:Res3} and \ref{Tab:Res4}. Vertical bars show simulated 95\% confidence intervals.}
	\label{Fig:Res2}
\end{figure}

As briefly mentioned above, it seems that the country-level petroleum dummy also holds some explanatory power, in particular with respect to group size. 
Figure \ref{Fig:Res3} shows first differences in expected relative group size when we set the country-level petroleum dummy from zero to one. Quite clearly, three of the four specifications indicate a negative effect of country-level petroleum production on the relative size of ethnic groups.
A tentative interpretation of this finding would be that, as implied by Fearon's (1999) Caselli and Coleman's (2006) framework, the total amount of available political ``pork'' in a country leads to the emergence of salient ethnic groups in general, regardless of the particular locality of petroleum production. However, it must be noted that a definite answer to whether this is the case, even if only in quantitative terms, would have to be supplemented by additional country-level evidence.

\begin{figure}[h]
	\centering
		\includegraphics[scale=.75]{Plots/cpdpet_wcsa_eout.pdf}
	\caption{First differences in expected group size as a function of country-level petroleum. Based on estimates from tables \ref{Tab:Res3} and \ref{Tab:Res4}. Vertical bars show simulated 95\% confidence intervals.}
	\label{Fig:Res3}
\end{figure}

Finally, consider table \ref{Tab:Res5}, showing the results obtained when interacting the petroleum variables with the ``GREG'' variable, measuring the relative size of local ethnolinguistic groups. 
Whereas in model 4a, I use simple bivariate interaction terms to estimate the hypothesized conditional effect, model 4b is based on a slightly more complicated specification, where the GREG-based variable is squared. I have added the latter specification because first difference calculations based on the simpler alternative have yielded implausibly large marginal effects at the limits of the GREG variable. This finding suggested that there might be a curvilinear conditional effect of petroleum on relative group size. Indeed, as is evident from table \ref{Tab:Res5}, such a curvilinear specification seems provide a reasonably good fit to the data.

\singlespacing 
\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{llrlrrlrrlrrlr}
 \hline
 &  & \multicolumn{ 6}{c}{Model 4a} & \multicolumn{ 6}{c}{Model 4b} \\ 
 &  & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Selection}} & \multicolumn{ 3}{c}{\textit{Hurdle}} & \multicolumn{ 3}{c}{\textit{Selection}} \\   \hline
 & \textit{Presence of Group} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} & \multicolumn{1}{c}{} & \multicolumn{1}{c}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{} \\ 
 & Petroleum & 0.89 & ** & \textit{0.29} & 0.36 &  & \textit{0.35} & 0.89 & ** & \textit{0.29} & 0.18 &  & \textit{0.17} \\ 
 & Petroleum (Country) & 0.14 &  & \textit{0.35} & -0.04 &  & \textit{0.26} & 0.14 &  & \textit{0.36} & -0.06 &  & \textit{0.26} \\ 
 & GREG min. rel. Size & -0.21 &  & \textit{0.70} & -1.05 & ** & \textit{0.40} & -0.21 &  & \textit{0.71} & -1.24 & ** & \textit{0.37} \\ 
 & GREG $\times$  Petroleum & -0.18 &  & \textit{0.71} & 0.41 &  & \textit{0.79} & -0.18 &  & \textit{0.40} & 0.65 &  & \textit{0.43} \\ 
 & GREG $\times$  Cntr. Petr. & -2.08 & * & \textit{0.83} & -1.50 & ** & \textit{0.54} & -2.08 & * & \textit{0.87} & -1.31 & ** & \textit{0.50} \\ 
 & Constant & 1.22 & ** & \textit{0.30} & 1.45 & ** & \textit{0.25} & 1.22 & ** & \textit{0.30} & 1.49 & ** & \textit{0.25} \\   \hline
 & \textit{Relative Group Size (logged)} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} \\ 
 & Petroleum & -0.39 &  & \textit{0.20} & -0.62 & ** & \textit{0.20} & -0.53 & ** & \textit{0.16} & -0.76 & ** & \textit{0.16} \\ 
 & Petroleum (Country) & -1.00 & ** & \textit{0.32} & -1.08 & ** & \textit{0.35} & -1.08 & ** & \textit{0.34} & -1.06 & ** & \textit{0.36} \\ 
 & GREG min. rel. Size & 1.06 & ** & \textit{0.38} & 1.00 &  & \textit{0.51} & 2.28 &  & \textit{1.86} & 2.46 & * & \textit{1.18} \\ 
 & GREG$^2$ & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -1.43 &  & \textit{1.95} & -1.75 &  & \textit{1.06} \\ 
 & GREG $\times$  Petroleum & 2.52 & * & \textit{1.23} & 2.22 & * & \textit{1.06} & 6.18 & ** & \textit{1.33} & 6.13 & ** & \textit{1.60} \\ 
 & GREG$^2$ $\times$  Petroleum & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -6.06 & ** & \textit{1.80} & -6.46 & ** & \textit{1.92} \\ 
 & GREG $\times$  Cntr. Petr. & 1.63 & * & \textit{0.74} & 2.66 & ** & \textit{0.89} & 4.26 & * & \textit{2.00} & 2.59 &  & \textit{1.49} \\ 
 & GREG$^2$ $\times$  Cntr. Petr. & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -5.28 & * & \textit{2.10} & -0.69 &  & \textit{1.97} \\ 
 & Constant & -2.07 & ** & \textit{0.27} & -1.75 & ** & \textit{0.30} & -2.13 & ** & \textit{0.29} & -1.82 & ** & \textit{0.31} \\   \hline
 & $\hat{\sigma}$ & 1.14 &  & \multicolumn{1}{l}{\textit{}} & 1.36 &  & \multicolumn{1}{l}{\textit{}} & 1.12 &  & \multicolumn{1}{l}{\textit{}} & 1.34 &  & \multicolumn{1}{l}{\textit{}} \\ 
 & $\hat{\rho}$ & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -0.99 & ** & \multicolumn{1}{l}{\textit{}} & \multicolumn{1}{l}{} &  & \multicolumn{1}{l}{\textit{}} & -0.99 & ** & \multicolumn{1}{l}{\textit{}} \\ 
 & Pseudo logLik & -4639.83 &  & \multicolumn{1}{l}{\textit{}} & -4533.52 &  & \multicolumn{1}{l}{\textit{}} & -4595.04 &  & \multicolumn{1}{l}{\textit{}} & -4507.65 &  & \multicolumn{1}{l}{} \\ 
 & N & 2786 &  & \multicolumn{1}{l}{\textit{}} & 2786 &  & \multicolumn{1}{l}{\textit{}} & 2786 &  & \multicolumn{1}{l}{\textit{}} & 2786 &  & \multicolumn{1}{l}{} \\   \hline  \hline
\end{tabular}
\caption{Estimation results, ex-colonial sample. Standard errors clustered on 47 countries in \textit{italic}. Signif. codes:  0.01 $**$, 0.05 $*$. }
\label{Tab:Res5}
\end{table}
\onehalfspacing

Since interaction effects are difficult to interpret based purely on the point estimates of their constitutive terms, I have calculated first difference effects based on the results reported in table \ref{Tab:Res5}.

The top panel of figure \ref{Fig:Res4} shows first differences in the predicted probability of observing a politically relevant group in a given grid cell when setting the cell-level petroleum dummy from zero to one, conditional on different values on the GREG-based ethnolinguistic group size variable. The bottom panel shows the empirical distribution of the GREG-based variable for the estimation sample. The plotted effects are based on the selection specification of model 4b in table \ref{Tab:Res5}. Note that the effects based on the hurdle models are practically identical. The same applies for figure \ref{Fig:Res5}. 

As before, the effect of cell-level petroleum on the probability of observing a politically relevant group seems to be positive, but there is little evidence for a conditional effect. This is also reflected in the large standard errors associated with the  \emph{GREG $\times$  Petroleum} term in the top half of table \ref{Tab:Res5}.

Analogously to figure \ref{Fig:Res4}, figure \ref{Fig:Res5} shows first differences in expected relative group size when moving from zero to one on the cell-level petroleum dummy. Note that the x-axis only extends to 0.5 (instead of 1), in order to highlight the effect estimated for small values on the GREG variable. 
In particular, the estimates seem to show tentative evidence in favor of our second conditional hypothesis. It does seem to be the case that cell-level petroleum only has a negative effect on the relative size of politically salient ethnic groups where there is sufficient local ethnolinguistic heterogeneity. In contrast, petroleum does not seem to have an effect on the demographic size of politically salient ethnic identities in regions where individuals belong to relatively large linguistic groups. Specifically, figure \ref{Fig:Res5} would suggest that we only observe a statistically significant effect in regions inhabited by a linguistic group that comprises less than approximately 8\% of a country's population. 
Though this might seem like a very small threshold, note that in the given sample of ex-colonial countries, the median ethnolinguistic group size observed in the grid-cell data is around 3.5$\%$.

\begin{figure}[h]
	\centering
		\includegraphics[scale=.75]{Plots/pdpet_wcsa_GREG_prsel_sel.pdf}
	\caption{First differences in predicted probabilities as a function of cell-level petroleum, conditional on ethnolinguistic heterogeneity. Based on estimates from table \ref{Tab:Res5}, model 4b. Dotted lines show simulated 95\% confidence intervals.}
	\label{Fig:Res4}
\end{figure}

\begin{figure}[h]
	\centering
		\includegraphics[scale=.75]{Plots/pdpet_wcsa_GREG_eout_sel.pdf}
	\caption{First differences in expected group size as a function of cell-level petroleum, conditional on ethnolinguistic heterogeneity. Based on estimates from table \ref{Tab:Res5}, model 4b. Dotted lines show simulated 95\% confidence intervals.}
	\label{Fig:Res5}
\end{figure}



\section{Conclusion}
\label{Sec:7}

Does petroleum extraction lead to the emergence politically salient ethnic identities? Since this paper must still be considered work in progress, any answer given to this question based on the above stated results is tentative. Nevertheless, the data does seem to show some more or less consistent patterns that lend themselves to interpretation.

First, whether the relationship under scrutiny exists seems to be highly dependent on the subset of countries under consideration. In particular, whereas the global sample has yielded no evidence for a petroleum-related effect on ethnic salience, there appears to be a clearly identifiable signal when we restrict analysis to the ex-colonial sample. While, given our existing knowledge of the dynamics of ethnic politics and their prevalence in the ex-colonial world, this conditional property does not come as a surprise, it should raise caution with regard to the ``resource curse'' literature in general. Many publications in this area exhibit the tendency of presenting alternative mechanisms for explaining the alleged effects from resource extraction to conflict, autocracy, or low economic growth as mutually exclusive, or ``competing''. The conditional nature of the above-reported effect of petroleum on ethnic salience, however, would suggest that the pathway from resource extraction to political outcomes may vary considerably across countries. For instance, given these results, it appears plausible to postulate that for the cases of South-Saharan Africa and developing Asia, the often reported link between petroleum extraction and civil war operates through the mobilization of previously irrelevant ethnic identities. Simultaneously, the same relationship may have very different causes in the Middle East and North Africa, where ethnic identities are generally less fluid, and ethnolinguistic heterogeneity is less pronounced. However, the marked differences in the obtained estimation results when moving from the global sample to the ex-colonial sample also highlight that further research determining the exact underlying causes for this conditionality is necessary. As mentioned above, the presently used criteria for defining the ex-colonial sample are unsatisfyingly arbitrary, and it would be highly desirable to identify substantially meaningful variables that help further explain where petroleum leads to the emergence of politically salient ethnic identities, and where it does not.

Second, probably the most important result established so far is that for the ex-colonial sample, there seems to be fairly consistent evidence of petroleum production having an effect on both the emergence of politically relevant ethnic identities, and the relative size of those groups that become relevant. This finding lends support to instrumentalist theories of ethnic salience, which stress the establishment of ethnicity-based political coalitions for the purpose of achieving tangible political ends. Moreover, in accordance with the constructivist understanding of ethnicity more generally, the fact that there is a measurable effect of petroleum on the emergence of salient ethnicity clearly speaks against primordialist views of ethnic identity as a natural constant in human decision making; rather, ethnic identities emerge as a result of social processes.

Despite these encouraging results, it must be noted that even for the ex-colonial sample, the evidence is not conclusive with respect to distinguishing between local and country-level effects.  Whereas the former would speak in favor of the theoretical framework presented in this paper, which provides several arguments for expecting a spatial association between petroleum extraction and ethnic mobilization, an effect operating purely on the country-level would rather point towards causal mechanisms that operate via central state policy. One potentially useful strategy for further disentangling these two effects would be to reinvestigate the use of spatial models capable of capturing geographic diffusion more appropriately, despite the econometric challenges discussed in section \ref{Sec:5}. 

Finally, the results obtained so far provide some tentative evidence for the proposition that the emergence of small politically relevant ethnic groups in a petroleum producing region is dependent on the presence of sufficient social heterogeneity, at least with respect to ethnolinguistic differences. This is an interesting result, since, if it proves to be robust, it may provide a useful framework for assessing where we should anticipate ethnic mobilization, and its potentially adverse consequences, in response to  newly discovered resources. Further research in this direction should focus on obtaining more reliable geo-coded data on social-heterogeneity, in particular with respect to ethnolinguistic diversity, to overcome the uncertainty associated with the AMS data used in the present paper.



\newpage
\singlespacing
\bibliographystyle{chicago}
\bibliography{eprembib}

\newpage
\onehalfspacing
\section{Appendix}
\label{Sec:8}

\subsection{Descriptive Statistics}
\label{Sec:Desc}

Table \ref{Tab:App1} summarizes all variables used in the statistical analyses presented in section \ref{Sec:4}.

\begin{table}[htbp]
\centering
\footnotesize
\tabcolsep=0.1cm
\begin{tabular}{lrrrrrrr}
\hline
\textit{Global Sample} & \textit{N} & \textit{Min.} & \textit{Max.} & \textit{Median} & \textit{Mean} & \textit{St. Dev.} & \textit{N (Cntr.)} \\  \hline
Group Presence & 13177 & 0.000 & 1.000 & 1.000 & 0.740 & \multicolumn{1}{l}{} & 161 \\ 
Relative Group Size & 9749 & 0.000 & 0.979 & 0.080 & 0.285 & 0.334 & 119 \\ 
Petroleum & 13177 & 0.000 & 1.000 & 0.000 & 0.195 & \multicolumn{1}{l}{} & 161 \\ 
Petroleum (Country) & 13177 & 0.000 & 1.000 & 1.000 & 0.896 & \multicolumn{1}{l}{} & 161 \\ 
Country Area (mio. Sqkm) & 13177 & 0.002 & 16.781 & 2.782 & 5.943 & 5.526 & 161 \\ 
Population (logged) & 13177 & 0.000 & 16.815 & 10.534 & 10.116 & 3.069 & 161 \\ 
GDPpc 1965 (logged) & 12012 & 5.577 & 11.353 & 8.503 & 8.155 & 1.276 & 113 \\ 
GREG min. rel. Size & 12949 & 0.000 & 1.000 & 0.015 & 0.276 & 0.396 & 161 \\  \hline
\textit{Ex-Colonial Sample} & &  & &  & & & \\  \hline
Group Presence & 2827 & 0.000 & 1.000 & 1.000 & 0.840 & \multicolumn{1}{l}{} & 47 \\ 
Relative Group Size & 2376 & 0.001 & 0.964 & 0.080 & 0.167 & 0.224 & 39 \\ 
Petroleum & 2827 & 0.000 & 1.000 & 0.000 & 0.081 & \multicolumn{1}{l}{} & 47 \\ 
Petroleum (Country) & 2827 & 0.000 & 1.000 & 1.000 & 0.693 & \multicolumn{1}{l}{} & 47 \\ 
Country Area (mio. Sqkm) & 2827 & 0.017 & 3.160 & 1.159 & 1.383 & 0.942 & 47 \\ 
Population (logged) & 2827 & 4.879 & 16.815 & 11.590 & 11.503 & 2.135 & 47 \\ 
GDPpc 1965 (logged) & 2560 & 5.577 & 8.773 & 6.859 & 6.804 & 0.487 & 39 \\ 
GREG min. rel. Size & 2786 & 0.000 & 1.000 & 0.036 & 0.162 & 0.274 & 47 \\  \hline \hline
\end{tabular}
\caption{Summary statistics.}
\label{Tab:App1}
\end{table}

Figures \ref{Map:App1} through \ref{Map:App4} chart the 100 km edge-length equal-area grid cells, with information on relative group size and petroleum production, for the global, and the restricted ex-colonial sample.

\begin{figure}[!h]
\centering
    \fbox{\includegraphics[angle=90, scale=.32]{Maps/global_groupsize_borders.png}}
    \caption{Relative group size of politically relevant, geographically concentrated ethnic groups in 2009 (from GeoEPR). Darker shadings indicate greater group-size. Diagonal pattern indicates absence of politically relevant group.}
	\label{Map:App1}
\end{figure}

\begin{figure}[!h]
\centering
    \fbox{\includegraphics[angle=90, scale=.32]{Maps/global_petrodata_borders.png}}
    \caption{Cells with productive petroleum fields, 2007 (from PETRODATA).}
	\label{Map:App2}
\end{figure}

\begin{figure}[!h]
\centering
    \fbox{\includegraphics[angle=90, scale=.32]{Maps/wcsa_groupsize_borders.png}}
    \caption{Relative group size of politically relevant, geographically concentrated ethnic groups in 2009  (from GeoEPR). Ex-colonial sample. Darker shadings indicate greater group-size. Diagonal pattern indicates absence of politically relevant group.}
	\label{Map:App3}
\end{figure}

\begin{figure}[!h]
\centering
    \fbox{\includegraphics[angle=90, scale=.32]{Maps/wcsa_petro_borders.png}}
    \caption{Cells with productive petroleum fields, 2007 (from PETRODATA). Ex-colonial sample.}
	\label{Map:App4}
\end{figure}

\end{document}